MediaWiki  master
Sanitizer Class Reference

HTML sanitizer for MediaWiki. More...

Static Public Member Functions

static armorFrenchSpaces ( $text, $space=' ')
 Armor French spaces with a replacement character. More...
 
static attributeWhitelist ( $element)
 Fetch the whitelist of acceptable attributes for a given element name. More...
 
static checkCss ( $value)
 Pick apart some CSS and check it for forbidden or unsafe structures. More...
 
static cleanUrl ( $url)
 
static cleanUrlCallback ( $matches)
 
static cssDecodeCallback ( $matches)
 
static decCharReference ( $codepoint)
 
static decodeChar ( $codepoint)
 Return UTF-8 string for a codepoint if that is a valid character reference, otherwise U+FFFD REPLACEMENT CHARACTER. More...
 
static decodeCharReferences ( $text)
 Decode any character references, numeric or named entities, in the text and return a UTF-8 string. More...
 
static decodeCharReferencesAndNormalize ( $text)
 Decode any character references, numeric or named entities, in the next and normalize the resulting string. More...
 
static decodeCharReferencesCallback ( $matches)
 
static decodeEntity ( $name)
 If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the UTF-8 encoding of that character. More...
 
static decodeTagAttributes ( $text)
 Return an associative array of attribute names and values from a partial tag string. More...
 
static encodeAttribute ( $text)
 Encode an attribute value for HTML output. More...
 
static escapeClass ( $class)
 Given a value, escape it so that it can be used as a CSS class and return it. More...
 
static escapeHtmlAllowEntities ( $html)
 Given HTML input, escape with htmlspecialchars but un-escape entities. More...
 
static escapeId ( $id, $options=[])
 Given a value, escape it so that it can be used in an id attribute and return it. More...
 
static escapeIdForAttribute ( $id, $mode=self::ID_PRIMARY)
 Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute. More...
 
static escapeIdForExternalInterwiki ( $id)
 Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment for external interwikis. More...
 
static escapeIdForLink ( $id)
 Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment. More...
 
static escapeIdReferenceList ( $referenceString)
 Given a string containing a space delimited list of ids, escape each id to match ids escaped by the escapeIdForAttribute() function. More...
 
static fixTagAttributes ( $text, $element, $sorted=false)
 Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes. More...
 
static getAttribNameRegex ()
 Used in Sanitizer::decodeTagAttributes to filter attributes. More...
 
static getAttribsRegex ()
 Regular expression to match HTML/XML attribute pairs within a tag. More...
 
static getRecognizedTagData ( $extratags=[], $removetags=[])
 Return the various lists of recognized tags. More...
 
static hackDocType ()
 Hack up a private DOCTYPE with HTML's standard entity declarations. More...
 
static hexCharReference ( $codepoint)
 
static isReservedDataAttribute ( $attr)
 Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code. More...
 
static mergeAttributes ( $a, $b)
 Merge two sets of HTML attributes. More...
 
static normalizeCharReferences ( $text)
 Ensure that any entities and character references are legal for XML and XHTML specifically. More...
 
static normalizeCharReferencesCallback ( $matches)
 
static normalizeCss ( $value)
 Normalize CSS into a format we can easily search for hostile input. More...
 
static normalizeEntity ( $name)
 If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the equivalent numeric entity reference (except for the core < > & "). More...
 
static normalizeSectionNameWhitespace ( $section)
 Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links. More...
 
static removeHTMLcomments ( $text)
 Remove '', and everything between. More...
 
static removeHTMLtags ( $text, $processCallback=null, $args=[], $extratags=[], $removetags=[], $warnCallback=null)
 Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments. More...
 
static safeEncodeAttribute ( $text)
 Encode an attribute value for HTML tags, with extra armoring against further wiki processing. More...
 
static safeEncodeTagAttributes ( $assoc_array)
 Build a partial tag string from an associative array of attribute names and values as returned by decodeTagAttributes. More...
 
static setupAttributeWhitelist ()
 Foreach array key (an allowed HTML element), return an array of allowed attributes. More...
 
static stripAllTags ( $html)
 Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text. More...
 
static validateAttributes ( $attribs, $whitelist)
 Take an array of attribute names and values and normalize or discard illegal values for the given whitelist. More...
 
static validateEmail ( $addr)
 Does a string look like an e-mail address? More...
 
static validateTag ( $params, $element)
 Takes attribute names and values for a tag and the tag name and validates that the tag is allowed to be present. More...
 
static validateTagAttributes ( $attribs, $element)
 Take an array of attribute names and values and normalize or discard illegal values for the given element type. More...
 

Public Attributes

const CHAR_REFS_REGEX
 Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences. More...
 
const ELEMENT_BITS_REGEX = '!^(/?)([A-Za-z][^\t\n\v />\0]*+)([^>]*?)(/?>)([^<]*)$!'
 Acceptable tag name charset from HTML5 parsing spec https://www.w3.org/TR/html5/syntax.html#tag-open-state. More...
 
const EVIL_URI_PATTERN = '!(^|\s|\*/\s*)(javascript|vbscript)([^\w]|$)!i'
 Blacklist for evil uris like javascript: WARNING: DO NOT use this in any place that actually requires blacklisting for security reasons. More...
 
const ID_FALLBACK = 1
 Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured. More...
 
const ID_PRIMARY = 0
 Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding. More...
 
const XMLNS_ATTRIBUTE_PATTERN = "/^xmlns:[:A-Z_a-z-.0-9]+$/"
 

Static Private Member Functions

static attributeWhitelistInternal ( $element)
 Fetch the whitelist of acceptable attributes for a given element name. More...
 
static escapeIdInternal ( $id, $mode)
 Helper for escapeIdFor*() functions. More...
 
static getTagAttributeCallback ( $set)
 Pick the appropriate attribute value from a match set from the attribs regex matches. More...
 
static normalizeWhitespace ( $text)
 
static setupAttributeWhitelistInternal ()
 Foreach array key (an allowed HTML element), return an array of allowed attributes. More...
 
static validateCodepoint ( $codepoint)
 Returns true if a given Unicode codepoint is a valid character in both HTML5 and XML. More...
 

Private Attributes

const HTML_ENTITIES
 List of all named character entities defined in HTML 4.01 https://www.w3.org/TR/html4/sgml/entities.html As well as ' which is only defined starting in XHTML1. More...
 
const const HTML_ENTITY_ALIASES
 Character entity aliases accepted by MediaWiki. More...
 

Static Private Attributes

static $attribNameRegex
 Lazy-initialised attribute name regex, see getAttribNameRegex() More...
 
const const static $attribsRegex
 Lazy-initialised attributes regex, see getAttribsRegex() More...
 

Detailed Description

HTML sanitizer for MediaWiki.

Definition at line 33 of file Sanitizer.php.

Member Function Documentation

◆ armorFrenchSpaces()

static Sanitizer::armorFrenchSpaces (   $text,
  $space = '&#160;' 
)
static

Armor French spaces with a replacement character.

Since
1.32
Parameters
string$textText to armor
string$spaceSpace character for the French spaces, defaults to '&#160;'
Returns
string Armored text

Definition at line 1179 of file Sanitizer.php.

Referenced by Parser\internalParseHalfParsed().

◆ attributeWhitelist()

static Sanitizer::attributeWhitelist (   $element)
static

Fetch the whitelist of acceptable attributes for a given element name.

Parameters
string$element
Returns
array A sequential array of acceptable attribute names
Deprecated:
since 1.34; should be private

Definition at line 1759 of file Sanitizer.php.

References wfDeprecated().

◆ attributeWhitelistInternal()

static Sanitizer::attributeWhitelistInternal (   $element)
staticprivate

Fetch the whitelist of acceptable attributes for a given element name.

Parameters
string$element
Returns
array An associative array where keys are acceptable attribute names

Definition at line 1772 of file Sanitizer.php.

◆ checkCss()

static Sanitizer::checkCss (   $value)
static

Pick apart some CSS and check it for forbidden or unsafe structures.

Returns a sanitized string. This sanitized string will have character references and escape sequences decoded and comments stripped (unless it is itself one valid comment, in which case the value will be passed through). If the input is just too evil, only a comment complaining about evilness will be returned.

Currently URL references, 'expression', 'tps' are forbidden.

NOTE: Despite the fact that character references are decoded, the returned string may contain character references given certain clever input strings. These character references must be escaped before the return value is embedded in HTML.

Parameters
string$value
Returns
string

Definition at line 1065 of file Sanitizer.php.

Referenced by CoreParserFunctions\displaytitle().

◆ cleanUrl()

static Sanitizer::cleanUrl (   $url)
static
Parameters
string$url
Returns
mixed|string

Definition at line 2079 of file Sanitizer.php.

References $matches.

Referenced by Parser\makeFreeExternalLink(), and Parser\replaceExternalLinks().

◆ cleanUrlCallback()

static Sanitizer::cleanUrlCallback (   $matches)
static
Parameters
array$matches
Returns
string

Definition at line 2133 of file Sanitizer.php.

References $matches.

◆ cssDecodeCallback()

static Sanitizer::cssDecodeCallback (   $matches)
static
Parameters
array$matches
Returns
string

Definition at line 1094 of file Sanitizer.php.

References $matches.

◆ decCharReference()

static Sanitizer::decCharReference (   $codepoint)
static
Parameters
int$codepoint
Returns
null|string

Definition at line 1622 of file Sanitizer.php.

◆ decodeChar()

static Sanitizer::decodeChar (   $codepoint)
static

Return UTF-8 string for a codepoint if that is a valid character reference, otherwise U+FFFD REPLACEMENT CHARACTER.

Parameters
int$codepoint
Returns
string
Access:
private

Definition at line 1725 of file Sanitizer.php.

◆ decodeCharReferences()

static Sanitizer::decodeCharReferences (   $text)
static

Decode any character references, numeric or named entities, in the text and return a UTF-8 string.

Parameters
string$text
Returns
string

Definition at line 1669 of file Sanitizer.php.

Referenced by IRCColourfulRCFeedFormatter\cleanupForIRC(), UploadBase\detectScript(), Parser\formatHeadings(), WebRequestUpload\getName(), Parser\getSectionNameFromStrippedText(), ParserOutput\getText(), CleanupImages\processRow(), and Parser\replaceInternalLinks2().

◆ decodeCharReferencesAndNormalize()

static Sanitizer::decodeCharReferencesAndNormalize (   $text)
static

Decode any character references, numeric or named entities, in the next and normalize the resulting string.

(T16952)

This is useful for page titles, not for text to be displayed, MediaWiki allows HTML entities to escape normalization as a feature.

Parameters
string$textAlready normalized, containing entities
Returns
string Still normalized, without entities

Definition at line 1686 of file Sanitizer.php.

Referenced by Title\newFromTextThrow(), and MediaWikiTitleCodec\parseTitle().

◆ decodeCharReferencesCallback()

static Sanitizer::decodeCharReferencesCallback (   $matches)
static
Parameters
string$matches
Returns
string

Definition at line 1706 of file Sanitizer.php.

References $matches.

◆ decodeEntity()

static Sanitizer::decodeEntity (   $name)
static

If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the UTF-8 encoding of that character.

Otherwise, returns pseudo-entity source (eg "&foo;")

Parameters
string$name
Returns
string

Definition at line 1741 of file Sanitizer.php.

◆ decodeTagAttributes()

static Sanitizer::decodeTagAttributes (   $text)
static

Return an associative array of attribute names and values from a partial tag string.

Attribute names are forced to lowercase, character references are decoded to UTF-8 text.

Parameters
string$text
Returns
array

Definition at line 1450 of file Sanitizer.php.

Referenced by LanguageConverter\autoConvert(), CoreParserFunctions\displaytitle(), Parser\extensionSubstitution(), Parser\extractTagsAndParams(), and ParserOutput\getText().

◆ encodeAttribute()

static Sanitizer::encodeAttribute (   $text)
static

Encode an attribute value for HTML output.

Parameters
string$text
Returns
string HTML-encoded text fragment

Definition at line 1156 of file Sanitizer.php.

Referenced by BenchmarkSanitizer\execute(), Xml\expandAttributes(), and Html\expandAttributes().

◆ escapeClass()

static Sanitizer::escapeClass (   $class)
static

◆ escapeHtmlAllowEntities()

static Sanitizer::escapeHtmlAllowEntities (   $html)
static

Given HTML input, escape with htmlspecialchars but un-escape entities.

This allows (generally harmless) entities like &#160; to survive.

Parameters
string$htmlHTML to escape
Returns
string Escaped input

Definition at line 1433 of file Sanitizer.php.

Referenced by Linker\formatComment(), AllMessagesTablePager\formatValue(), and SearchHighlighter\removeWiki().

◆ escapeId()

static Sanitizer::escapeId (   $id,
  $options = [] 
)
static

Given a value, escape it so that it can be used in an id attribute and return it.

This will use HTML5 validation, allowing anything but ASCII whitespace.

To ensure we don't have to bother escaping anything, we also strip ', ". TODO: Is this the best tactic?

We also strip # because it upsets IE, and % because it could be ambiguous if it's part of something that looks like a percent escape (which don't work reliably in fragments cross-browser).

Deprecated:
since 1.30, use one of this class' escapeIdFor*() functions
See also
https://www.w3.org/TR/html401/types.html#type-name Valid characters in the id and name attributes
https://www.w3.org/TR/html401/struct/links.html#h-12.2.3 Anchors with the id attribute
https://www.w3.org/TR/html5/dom.html#the-id-attribute HTML5 definition of id attribute
Parameters
string$idId to escape
string | array$optionsString or array of strings (default is []): 'noninitial': This is a non-initial fragment of an id, not a full id, so don't pay attention if the first character isn't valid at the beginning of an id.
Returns
string

Definition at line 1261 of file Sanitizer.php.

◆ escapeIdForAttribute()

static Sanitizer::escapeIdForAttribute (   $id,
  $mode = self::ID_PRIMARY 
)
static

Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute.

WARNING: unlike escapeId(), the output of this function is not guaranteed to be HTML safe, be sure to use proper escaping.

Parameters
string$idString to escape
int$modeOne of ID_* constants, specifying whether the primary or fallback encoding should be used.
Returns
string|bool Escaped ID or false if fallback encoding is requested but it's not configured.
Since
1.30

Definition at line 1295 of file Sanitizer.php.

References $wgFragmentMode.

Referenced by Skin\addToSidebarPlain(), HTMLFormFieldCloner\createFieldsForKey(), HTMLForm\displaySection(), SpecialListGrants\execute(), SpecialListGroupRights\execute(), SpecialPasswordPolicies\execute(), Parser\formatHeadings(), HTMLRadioField\formatOptions(), OOUIHTMLForm\formatSection(), HTMLForm\formatSection(), HTMLFormFieldCloner\getCreateButtonHtml(), SpecialVersion\getCreditsForExtension(), HTMLFormFieldCloner\getDeleteButtonHtml(), BaseTemplate\getFooter(), BaseTemplate\getIndicators(), InfoAction\makeHeader(), Parser\makeLegacyAnchor(), and ApiMain\modifyHelp().

◆ escapeIdForExternalInterwiki()

static Sanitizer::escapeIdForExternalInterwiki (   $id)
static

Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment for external interwikis.

Parameters
string$idString to escape
Returns
string Escaped ID
Since
1.30

Definition at line 1345 of file Sanitizer.php.

References $wgExternalInterwikiFragmentMode.

Referenced by Title\getFragmentForURL().

◆ escapeIdForLink()

static Sanitizer::escapeIdForLink (   $id)
static

Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment.

WARNING: unlike escapeId(), the output of this function is not guaranteed to be HTML safe, be sure to use proper escaping.

Parameters
string$idString to escape
Returns
string Escaped ID
Since
1.30

Definition at line 1322 of file Sanitizer.php.

References $wgFragmentMode.

Referenced by Parser\formatHeadings(), Title\getFragmentForURL(), Parser\makeAnchor(), and Parser\makeLegacyAnchor().

◆ escapeIdInternal()

static Sanitizer::escapeIdInternal (   $id,
  $mode 
)
staticprivate

Helper for escapeIdFor*() functions.

Performs most of the actual escaping.

Parameters
string$idString to escape
string$modeOne of modes from $wgFragmentMode
Returns
string

Definition at line 1360 of file Sanitizer.php.

◆ escapeIdReferenceList()

static Sanitizer::escapeIdReferenceList (   $referenceString)
static

Given a string containing a space delimited list of ids, escape each id to match ids escaped by the escapeIdForAttribute() function.

Since
1.27
Parameters
string$referenceStringSpace delimited list of ids
Returns
string

Definition at line 1391 of file Sanitizer.php.

◆ fixTagAttributes()

static Sanitizer::fixTagAttributes (   $text,
  $element,
  $sorted = false 
)
static

Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes.

Output is safe for further wikitext processing, with escaping of values that could trigger problems.

  • Normalizes attribute names to lowercase
  • Discards attributes not on a whitelist for the given element
  • Turns broken or invalid entities into plaintext
  • Double-quotes all attribute values
  • Attributes without values are given the name as attribute
  • Double attributes are discarded
  • Unsafe style attributes are discarded
  • Prepends space if there are attributes.
  • (Optionally) Sorts attributes by name.
Parameters
string$text
string$element
bool$sortedWhether to sort the attributes (default: false)
Returns
string

Definition at line 1136 of file Sanitizer.php.

Referenced by Parser\doTableStuff().

◆ getAttribNameRegex()

static Sanitizer::getAttribNameRegex ( )
static

Used in Sanitizer::decodeTagAttributes to filter attributes.

Returns
string

Definition at line 385 of file Sanitizer.php.

◆ getAttribsRegex()

static Sanitizer::getAttribsRegex ( )
static

Regular expression to match HTML/XML attribute pairs within a tag.

Based on https://www.w3.org/TR/html5/syntax.html#before-attribute-name-state Used in Sanitizer::decodeTagAttributes

Returns
string

Definition at line 356 of file Sanitizer.php.

◆ getRecognizedTagData()

static Sanitizer::getRecognizedTagData (   $extratags = [],
  $removetags = [] 
)
static

Return the various lists of recognized tags.

Parameters
array$extratagsFor any extra tags to include
array$removetagsFor any tags (default or extra) to exclude
Returns
array

Definition at line 400 of file Sanitizer.php.

References $wgAllowImageTag.

◆ getTagAttributeCallback()

static Sanitizer::getTagAttributeCallback (   $set)
staticprivate

Pick the appropriate attribute value from a match set from the attribs regex matches.

Parameters
array$set
Exceptions
MWExceptionWhen tag conditions are not met.
Returns
string

Definition at line 1511 of file Sanitizer.php.

◆ hackDocType()

static Sanitizer::hackDocType ( )
static

Hack up a private DOCTYPE with HTML's standard entity declarations.

PHP 4 seemed to know these if you gave it an HTML doctype, but PHP 5.1 doesn't.

Use for passing XHTML fragments to PHP's XML parsing functions

Returns
string

Definition at line 2066 of file Sanitizer.php.

Referenced by Xml\isWellFormedXmlFragment().

◆ hexCharReference()

static Sanitizer::hexCharReference (   $codepoint)
static
Parameters
int$codepoint
Returns
null|string

Definition at line 1635 of file Sanitizer.php.

◆ isReservedDataAttribute()

static Sanitizer::isReservedDataAttribute (   $attr)
static

Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code.

Parameters
string$attrAttribute name.
Returns
bool

Definition at line 915 of file Sanitizer.php.

Referenced by EnhancedChangesList\recentChangesBlockLine().

◆ mergeAttributes()

static Sanitizer::mergeAttributes (   $a,
  $b 
)
static

Merge two sets of HTML attributes.

Conflicting items in the second set will override those in the first, except for 'class' attributes which will be combined (if they're both strings).

Todo:
implement merging for other attributes such as style
Parameters
array$a
array$b
Returns
array

Definition at line 936 of file Sanitizer.php.

Referenced by OutputPage\headElement(), MediaWiki\Linker\LinkRenderer\mergeAttribs(), MediaWiki\EditPage\TextboxBuilder\mergeClassesIntoAttributes(), and TraditionalImageGallery\toHTML().

◆ normalizeCharReferences()

static Sanitizer::normalizeCharReferences (   $text)
static

Ensure that any entities and character references are legal for XML and XHTML specifically.

Any stray bits will be &-escaped to result in a valid text fragment.

a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use lower cased "&#x", not "&#X" d. fix or reject non-valid attributes

Parameters
string$text
Returns
string
Access:
private

Definition at line 1569 of file Sanitizer.php.

Referenced by CoreParserFunctions\displaytitle(), OutputPage\getDisplayTitle(), Parser\internalParseHalfParsed(), and OutputPage\setPageTitle().

◆ normalizeCharReferencesCallback()

static Sanitizer::normalizeCharReferencesCallback (   $matches)
static
Parameters
string$matches
Returns
string

Definition at line 1580 of file Sanitizer.php.

References $matches.

◆ normalizeCss()

static Sanitizer::normalizeCss (   $value)
static

Normalize CSS into a format we can easily search for hostile input.

  • decode character references
  • decode escape sequences
  • convert characters that IE6 interprets into ascii
  • remove comments, unless the entire value is one single comment
    Parameters
    string$valuethe css string
    Returns
    string normalized css

Definition at line 958 of file Sanitizer.php.

References $matches, and StringUtils\delimiterReplace().

Referenced by UploadBase\checkSvgScriptCallback().

◆ normalizeEntity()

static Sanitizer::normalizeEntity (   $name)
static

If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the equivalent numeric entity reference (except for the core < > & ").

If the entity is a MediaWiki-specific alias, returns the HTML equivalent. Otherwise, returns HTML-escaped text of pseudo-entity source (eg &foo;)

Parameters
string$name
Returns
string

Definition at line 1606 of file Sanitizer.php.

◆ normalizeSectionNameWhitespace()

static Sanitizer::normalizeSectionNameWhitespace (   $section)
static

Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links.

Parameters
string$section
Returns
string

Definition at line 1550 of file Sanitizer.php.

Referenced by Parser\formatHeadings(), and Parser\getSectionNameFromStrippedText().

◆ normalizeWhitespace()

static Sanitizer::normalizeWhitespace (   $text)
staticprivate
Parameters
string$text
Returns
string

Definition at line 1535 of file Sanitizer.php.

◆ removeHTMLcomments()

static Sanitizer::removeHTMLcomments (   $text)
static

Remove '', and everything between.

To avoid leaving blank lines, when a comment is both preceded and followed by a newline (ignoring spaces), trim leading and trailing spaces and one of the newlines.

Parameters
string$text
Returns
string

Definition at line 709 of file Sanitizer.php.

◆ removeHTMLtags()

static Sanitizer::removeHTMLtags (   $text,
  $processCallback = null,
  $args = [],
  $extratags = [],
  $removetags = [],
  $warnCallback = null 
)
static

Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments.

Parameters
string$text
callable | null$processCallbackCallback to do any variable or parameter replacements in HTML attribute values
array | bool$argsArguments for the processing callback
array$extratagsFor any extra tags to include
array$removetagsFor any tags (default or extra) to exclude
callable | null$warnCallback(Deprecated) Callback allowing the addition of a tracking category when bad input is encountered. DO NOT ADD NEW PARAMETERS AFTER $warnCallback, since it will be removed shortly.
Returns
string

Definition at line 497 of file Sanitizer.php.

References $args, $t, MWTidy\isEnabled(), and wfDeprecated().

Referenced by CoreParserFunctions\displaytitle(), BenchmarkSanitizer\execute(), OutputPage\getDisplayTitle(), Parser\internalParse(), OutputPage\setPageTitle(), and Parser\testSrvus().

◆ safeEncodeAttribute()

static Sanitizer::safeEncodeAttribute (   $text)
static

Encode an attribute value for HTML tags, with extra armoring against further wiki processing.

Parameters
string$text
Returns
string HTML-encoded text fragment

Definition at line 1199 of file Sanitizer.php.

References $matches, and wfUrlProtocols().

Referenced by CoreParserFunctions\anchorencode(), and BenchmarkSanitizer\execute().

◆ safeEncodeTagAttributes()

static Sanitizer::safeEncodeTagAttributes (   $assoc_array)
static

Build a partial tag string from an associative array of attribute names and values as returned by decodeTagAttributes.

Parameters
array$assoc_array
Returns
string

Definition at line 1492 of file Sanitizer.php.

Referenced by CoreParserFunctions\displaytitle().

◆ setupAttributeWhitelist()

static Sanitizer::setupAttributeWhitelist ( )
static

Foreach array key (an allowed HTML element), return an array of allowed attributes.

Returns
array
Deprecated:
since 1.34; should be private

Definition at line 1783 of file Sanitizer.php.

References wfDeprecated().

◆ setupAttributeWhitelistInternal()

static Sanitizer::setupAttributeWhitelistInternal ( )
staticprivate

Foreach array key (an allowed HTML element), return an array of allowed attributes.

Returns
array An associative array: keys are HTML element names; values are associative arrays where the keys are allowed attribute names.

Definition at line 1800 of file Sanitizer.php.

◆ stripAllTags()

static Sanitizer::stripAllTags (   $html)
static

Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text.

Warning: this return value must be further escaped for literal inclusion in HTML output as of 1.10!

Parameters
string$htmlHTML fragment
Returns
string
-taint tainted

Definition at line 2041 of file Sanitizer.php.

Referenced by UploadStashException\__construct(), LocalizedException\__construct(), MWDebug\appendDebugInfoToApiResult(), CoreParserFunctions\displaytitle(), BenchmarkSanitizer\execute(), WikiTextStructure\extractHeadingBeforeFirstHeading(), WikiTextStructure\extractWikitextParts(), ChangesListSpecialPage\getChangeTagList(), CliInstaller\getMessageText(), WikiTextStructure\headings(), OutputPage\setPageTitle(), Parser\stripAltText(), ApiErrorFormatter\stripMarkup(), and ChangeTags\truncateTagDescription().

◆ validateAttributes()

static Sanitizer::validateAttributes (   $attribs,
  $whitelist 
)
static

Take an array of attribute names and values and normalize or discard illegal values for the given whitelist.

  • Discards attributes not on the given whitelist
  • Unsafe style attributes are discarded
  • Invalid id attributes are re-encoded
Parameters
array$attribs
array$whitelistList of allowed attribute names, either as a sequential array of valid attribute names or as an associative array where keys give valid attribute names
Returns
array
Todo:

Check for legal values where the DTD limits things.

Check for unique id attribute :P

Definition at line 813 of file Sanitizer.php.

References wfUrlProtocols().

◆ validateCodepoint()

static Sanitizer::validateCodepoint (   $codepoint)
staticprivate

Returns true if a given Unicode codepoint is a valid character in both HTML5 and XML.

Parameters
int$codepoint
Returns
bool

Definition at line 1650 of file Sanitizer.php.

◆ validateEmail()

static Sanitizer::validateEmail (   $addr)
static

Does a string look like an e-mail address?

This validates an email address using an HTML5 specification found at: http://www.whatwg.org/html/states-of-the-type-attribute.html#valid-e-mail-address Which as of 2011-01-24 says:

A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str *( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

This function is an implementation of the specification as requested in T24449.

Client-side forms will use the same standard validation rules via JS or HTML 5 validation; additional restrictions can be enforced server-side by extensions via the 'isValidEmailAddr' hook.

Note that this validation doesn't 100% match RFC 2822, but is believed to be liberal enough for wide use. Some invalid addresses will still pass validation here.

Since
1.18
Parameters
string$addrE-mail address
Returns
bool

Definition at line 2165 of file Sanitizer.php.

References Hooks\run().

Referenced by SpecialChangeEmail\attemptChange(), Autopromote\checkCondition(), RemoveInvalidEmails\execute(), BenchmarkSanitizer\execute(), ResetUserEmail\execute(), SpecialConfirmEmail\execute(), PasswordReset\execute(), LoginSignupSpecialPage\getFieldDefinitions(), User\isEmailConfirmed(), MediaWiki\Auth\UserDataAuthenticationRequest\populateUser(), and WebInstallerName\submit().

◆ validateTag()

static Sanitizer::validateTag (   $params,
  $element 
)
static

Takes attribute names and values for a tag and the tag name and validates that the tag is allowed to be present.

This DOES NOT validate the attributes, nor does it validate the tags themselves. This method only handles the special circumstances where we may want to allow a tag within content but ONLY when it has specific attributes set.

Parameters
string$params
string$element
Returns
bool

Definition at line 755 of file Sanitizer.php.

◆ validateTagAttributes()

static Sanitizer::validateTagAttributes (   $attribs,
  $element 
)
static

Take an array of attribute names and values and normalize or discard illegal values for the given element type.

  • Discards attributes not on a whitelist for the given element
  • Unsafe style attributes are discarded
  • Invalid id attributes are re-encoded
Parameters
array$attribs
string$element
Returns
array
Todo:

Check for legal values where the DTD limits things.

Check for unique id attribute :P

Definition at line 791 of file Sanitizer.php.

Referenced by CoreTagHooks\pre(), and Parser\renderImageGallery().

Member Data Documentation

◆ $attribNameRegex

Sanitizer::$attribNameRegex
staticprivate

Lazy-initialised attribute name regex, see getAttribNameRegex()

Definition at line 379 of file Sanitizer.php.

◆ $attribsRegex

const const static Sanitizer::$attribsRegex
staticprivate
Initial value:
=> 'rlm',
]

Lazy-initialised attributes regex, see getAttribsRegex()

Definition at line 342 of file Sanitizer.php.

◆ CHAR_REFS_REGEX

const Sanitizer::CHAR_REFS_REGEX
Initial value:
=
'/&([A-Za-z0-9\x80-\xff]+);
|&\#([0-9]+);
|&\#[xX]([0-9A-Fa-f]+);
|(&)/x'

Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences.

Definition at line 38 of file Sanitizer.php.

◆ ELEMENT_BITS_REGEX

const Sanitizer::ELEMENT_BITS_REGEX = '!^(/?)([A-Za-z][^\t\n\v />\0]*+)([^>]*?)(/?>)([^<]*)$!'

Acceptable tag name charset from HTML5 parsing spec https://www.w3.org/TR/html5/syntax.html#tag-open-state.

Definition at line 48 of file Sanitizer.php.

◆ EVIL_URI_PATTERN

const Sanitizer::EVIL_URI_PATTERN = '!(^|\s|\*/\s*)(javascript|vbscript)([^\w]|$)!i'

Blacklist for evil uris like javascript: WARNING: DO NOT use this in any place that actually requires blacklisting for security reasons.

There are NUMEROUS1 ways to bypass blacklisting, the only way to be secure from javascript: uri based xss vectors is to whitelist things that you know are safe and deny everything else.

Definition at line 58 of file Sanitizer.php.

◆ HTML_ENTITIES

const Sanitizer::HTML_ENTITIES
private
Initial value:
= [
'Aacute' => 193

List of all named character entities defined in HTML 4.01 https://www.w3.org/TR/html4/sgml/entities.html As well as ' which is only defined starting in XHTML1.

Definition at line 81 of file Sanitizer.php.

◆ HTML_ENTITY_ALIASES

const const Sanitizer::HTML_ENTITY_ALIASES
private
Initial value:
= [
'רלמ' => 'rlm'

Character entity aliases accepted by MediaWiki.

Definition at line 340 of file Sanitizer.php.

◆ ID_FALLBACK

const Sanitizer::ID_FALLBACK = 1

Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured.

Since
1.30

Definition at line 74 of file Sanitizer.php.

Referenced by Parser\formatHeadings(), Parser\makeLegacyAnchor(), and ApiMain\modifyHelp().

◆ ID_PRIMARY

const Sanitizer::ID_PRIMARY = 0

Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding.

Since
1.30

Definition at line 66 of file Sanitizer.php.

Referenced by Parser\formatHeadings(), and ApiMain\modifyHelp().

◆ XMLNS_ATTRIBUTE_PATTERN

const Sanitizer::XMLNS_ATTRIBUTE_PATTERN = "/^xmlns:[:A-Z_a-z-.0-9]+$/"

Definition at line 59 of file Sanitizer.php.


The documentation for this class was generated from the following file: