MediaWiki master
|
HTML sanitizer for MediaWiki. More...
Static Public Member Functions | |
static | armorFrenchSpaces ( $text, $space=' ') |
Armor French spaces with a replacement character. | |
static | checkCss ( $value) |
Pick apart some CSS and check it for forbidden or unsafe structures. | |
static | cleanUrl ( $url) |
static | decodeCharReferences ( $text) |
Decode any character references, numeric or named entities, in the text and return a UTF-8 string. | |
static | decodeCharReferencesAndNormalize ( $text) |
Decode any character references, numeric or named entities, in the next and normalize the resulting string. | |
static | decodeTagAttributes ( $text) |
Return an associative array of attribute names and values from a partial tag string. | |
static | encodeAttribute ( $text) |
Encode an attribute value for HTML output. | |
static | escapeClass ( $class) |
Given a value, escape it so that it can be used as a CSS class and return it. | |
static | escapeHtmlAllowEntities ( $html) |
Given HTML input, escape with htmlspecialchars but un-escape entities. | |
static | escapeIdForAttribute ( $id, $mode=self::ID_PRIMARY) |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute. | |
static | escapeIdForExternalInterwiki ( $id) |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment for external interwikis. | |
static | escapeIdForLink ( $id) |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment. | |
static | fixTagAttributes ( $text, $element, $sorted=false) |
Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes. | |
static | getRecognizedTagData ( $extratags=[], $removetags=[]) |
Return the various lists of recognized tags. | |
static | hackDocType () |
Hack up a private DOCTYPE with HTML's standard entity declarations. | |
static | internalRemoveHtmlTags ( $text, $processCallback=null, $args=[], $extratags=[], $removetags=[]) |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; BEWARE there may be unmatched HTML tags in the result. | |
static | isReservedDataAttribute ( $attr) |
Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code. | |
static | mergeAttributes ( $a, $b) |
Merge two sets of HTML attributes. | |
static | normalizeCharReferences ( $text) |
Ensure that any entities and character references are legal for XML and XHTML specifically. | |
static | normalizeCss ( $value) |
Normalize CSS into a format we can easily search for hostile input. | |
static | normalizeSectionNameWhitespace ( $section) |
Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links. | |
static | removeHTMLcomments ( $text) |
Remove '', and everything between. | |
static | removeHTMLtags ( $text, $processCallback=null, $args=[], $extratags=[], $removetags=[]) |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; BEWARE there may be unmatched HTML tags in the result. | |
static | removeSomeTags (string $text, array $options=[]) |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; the result will always be balanced and tidy HTML. | |
static | safeEncodeAttribute ( $text) |
Encode an attribute value for HTML tags, with extra armoring against further wiki processing. | |
static | safeEncodeTagAttributes ( $assoc_array) |
Build a partial tag string from an associative array of attribute names and values as returned by decodeTagAttributes. | |
static | stripAllTags ( $html) |
Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text. | |
static | validateAttributes ( $attribs, $allowed) |
Take an array of attribute names and values and normalize or discard illegal values. | |
static | validateEmail ( $addr) |
Does a string look like an e-mail address? | |
static | validateTagAttributes ( $attribs, $element) |
Take an array of attribute names and values and normalize or discard illegal values for the given element type. | |
Public Attributes | |
const | ID_FALLBACK = 1 |
Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured. | |
const | ID_PRIMARY = 0 |
Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding. | |
HTML sanitizer for MediaWiki.
Definition at line 46 of file Sanitizer.php.
|
static |
Armor French spaces with a replacement character.
string | $text | Text to armor |
string | $space | Space character for the French spaces, defaults to ' ' |
Definition at line 888 of file Sanitizer.php.
|
static |
Pick apart some CSS and check it for forbidden or unsafe structures.
Returns a sanitized string. This sanitized string will have character references and escape sequences decoded and comments stripped (unless it is itself one valid comment, in which case the value will be passed through). If the input is just too evil, only a comment complaining about evilness will be returned.
Currently URL references, 'expression', 'tps' are forbidden.
NOTE: Despite the fact that character references are decoded, the returned string may contain character references given certain clever input strings. These character references must be escaped before the return value is embedded in HTML.
string | $value |
Definition at line 771 of file Sanitizer.php.
|
static |
string | $url |
Definition at line 1774 of file Sanitizer.php.
References $matches.
|
static |
Decode any character references, numeric or named entities, in the text and return a UTF-8 string.
string | $text |
Definition at line 1368 of file Sanitizer.php.
Referenced by MediaWiki\Request\WebRequestUpload\getName().
|
static |
Decode any character references, numeric or named entities, in the next and normalize the resulting string.
(T16952)
This is useful for page titles, not for text to be displayed, MediaWiki allows HTML entities to escape normalization as a feature.
string | $text | Already normalized, containing entities |
Definition at line 1385 of file Sanitizer.php.
Referenced by MediaWiki\Title\MediaWikiTitleCodec\parseTitle().
|
static |
Return an associative array of attribute names and values from a partial tag string.
Attribute names are forced to lowercase, character references are decoded to UTF-8 text.
string | $text |
Definition at line 1137 of file Sanitizer.php.
|
static |
Encode an attribute value for HTML output.
string | $text |
Definition at line 865 of file Sanitizer.php.
|
static |
Given a value, escape it so that it can be used as a CSS class and return it.
string | $class |
Definition at line 1103 of file Sanitizer.php.
Referenced by MediaWiki\SpecialPage\ChangesListSpecialPage\makeLegend().
|
static |
Given HTML input, escape with htmlspecialchars but un-escape entities.
This allows (generally harmless) entities like   to survive.
string | $html | HTML to escape |
Definition at line 1120 of file Sanitizer.php.
Referenced by MediaWiki\Pager\AllMessagesTablePager\formatValue().
|
static |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute.
WARNING: The output of this function is not guaranteed to be HTML safe, so be sure to use proper escaping.
string | $id | String to escape |
int | $mode | One of ID_* constants, specifying whether the primary or fallback encoding should be used. |
Definition at line 957 of file Sanitizer.php.
References $wgFragmentMode.
Referenced by MediaWiki\Specials\SpecialListGroupRights\execute(), and MediaWiki\Specials\SpecialPasswordPolicies\execute().
|
static |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment for external interwikis.
string | $id | String to escape |
Definition at line 1007 of file Sanitizer.php.
References $wgExternalInterwikiFragmentMode.
|
static |
Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment.
WARNING: The output of this function is not guaranteed to be HTML safe, so be sure to use proper escaping.
string | $id | String to escape |
Definition at line 984 of file Sanitizer.php.
References $wgFragmentMode.
|
static |
Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes.
Output is safe for further wikitext processing, with escaping of values that could trigger problems.
string | $text | |
string | $element | |
bool | $sorted | Whether to sort the attributes (default: false) |
Definition at line 843 of file Sanitizer.php.
Referenced by MediaWiki\Parser\Sanitizer\internalRemoveHtmlTags().
|
static |
Return the various lists of recognized tags.
string[] | $extratags | For any extra tags to include |
string[] | $removetags | For any tags (default or extra) to exclude |
Definition at line 157 of file Sanitizer.php.
References $wgAllowImageTag, and wfDeprecatedMsg().
Referenced by MediaWiki\Parser\Sanitizer\internalRemoveHtmlTags().
|
static |
Hack up a private DOCTYPE with HTML's standard entity declarations.
PHP 4 seemed to know these if you gave it an HTML doctype, but PHP 5.1 doesn't.
Use for passing XHTML fragments to PHP's XML parsing functions
Definition at line 1750 of file Sanitizer.php.
|
static |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; BEWARE there may be unmatched HTML tags in the result.
::removeSomeTags()
instead of this method. Sanitizer::removeSomeTags()
is safer and will always return well-formed HTML; however, it is significantly slower (especially for short strings where setup costs predominate). This method is for internal use by the legacy parser where we know the result will be cleaned up in a subsequent tidy pass.string | $text | Original string; see T268353 for why untainted. |
callable | null | $processCallback | Callback to do any variable or parameter replacements in HTML attribute values. This argument should be considered |
array | bool | $args | Arguments for the processing callback |
array | $extratags | For any extra tags to include |
array | $removetags | For any tags (default or extra) to exclude |
Definition at line 322 of file Sanitizer.php.
References MediaWiki\Parser\Sanitizer\fixTagAttributes(), MediaWiki\Parser\Sanitizer\getRecognizedTagData(), and MediaWiki\Parser\Sanitizer\removeHTMLcomments().
Referenced by MediaWiki\Parser\Sanitizer\removeHTMLtags().
|
static |
Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code.
string | $attr | Attribute name. |
Definition at line 659 of file Sanitizer.php.
|
static |
Merge two sets of HTML attributes.
Conflicting items in the second set will override those in the first, except for 'class' attributes which will be combined (if they're both strings).
array | $a | |
array | $b |
Definition at line 680 of file Sanitizer.php.
Referenced by MediaWiki\EditPage\TextboxBuilder\mergeClassesIntoAttributes().
|
static |
Ensure that any entities and character references are legal for XML and XHTML specifically.
Any stray bits will be &-escaped to result in a valid text fragment.
a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use lower cased "&#x", not "&#X" d. fix or reject non-valid attributes
string | $text |
Definition at line 1255 of file Sanitizer.php.
Referenced by MediaWiki\Tidy\RemexCompatFormatter\characters(), and MediaWiki\Tidy\RemexCompatFormatter\element().
|
static |
Normalize CSS into a format we can easily search for hostile input.
string | $value | the css string |
Definition at line 701 of file Sanitizer.php.
|
static |
Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links.
string | $section |
Definition at line 1236 of file Sanitizer.php.
|
static |
Remove '', and everything between.
To avoid leaving blank lines, when a comment is both preceded and followed by a newline (ignoring spaces), trim leading and trailing spaces and one of the newlines.
string | $text |
Definition at line 446 of file Sanitizer.php.
Referenced by MediaWiki\Parser\Sanitizer\internalRemoveHtmlTags().
|
static |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; BEWARE there may be unmatched HTML tags in the result.
::removeSomeTags()
instead of this method. Sanitizer::removeSomeTags()
is safer and will always return well-formed HTML; however, it is significantly slower (especially for short strings where setup costs predominate). This method, although faster, should only be used where we know the result be cleaned up in a subsequent tidy pass.string | $text | Original string; see T268353 for why untainted. |
callable | null | $processCallback | Callback to do any variable or parameter replacements in HTML attribute values. This argument should be considered |
array | bool | $args | Arguments for the processing callback |
array | $extratags | For any extra tags to include |
array | $removetags | For any tags (default or extra) to exclude |
Definition at line 285 of file Sanitizer.php.
References MediaWiki\Parser\Sanitizer\internalRemoveHtmlTags(), and wfDeprecated().
|
static |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments; the result will always be balanced and tidy HTML.
string | $text | Source string; see T268353 for why untainted |
array | $options | Options controlling the cleanup: string[] $options['extraTags'] Any extra tags to allow (This property taints the whole array.) string[] $options['removeTags'] Any tags (default or extra) to exclude callable(Attributes,...):Attributes $options['attrCallback'] Callback to do any variable or parameter replacements in HTML attribute values before further cleanup; should be considered |
Definition at line 397 of file Sanitizer.php.
|
static |
Encode an attribute value for HTML tags, with extra armoring against further wiki processing.
string | $text |
Definition at line 909 of file Sanitizer.php.
References $matches, and wfUrlProtocols().
|
static |
Build a partial tag string from an associative array of attribute names and values as returned by decodeTagAttributes.
array | $assoc_array |
Definition at line 1179 of file Sanitizer.php.
|
static |
Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text.
Warning: this return value must be further escaped for literal inclusion in HTML output as of 1.10!
string | $html | HTML fragment |
Definition at line 1723 of file Sanitizer.php.
|
static |
Take an array of attribute names and values and normalize or discard illegal values.
array | $attribs | |
array | $allowed | List of allowed attribute names, as an associative array where keys give valid attribute names (since 1.34). Before 1.35, passing a sequential array of valid attribute names was permitted but that is now deprecated. |
Check for legal values where the DTD limits things.
Check for unique id attribute :P
Definition at line 553 of file Sanitizer.php.
References wfDeprecated(), and wfUrlProtocols().
|
static |
Does a string look like an e-mail address?
This validates an email address using an HTML5 specification found at: http://www.whatwg.org/html/states-of-the-type-attribute.html#valid-e-mail-address Which as of 2011-01-24 says:
A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str *( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.
This function is an implementation of the specification as requested in T24449.
Client-side forms will use the same standard validation rules via JS or HTML 5 validation; additional restrictions can be enforced server-side by extensions via the 'isValidEmailAddr' hook.
Note that this validation doesn't 100% match RFC 2822, but is believed to be liberal enough for wide use. Some invalid addresses will still pass validation here.
string | $addr | E-mail address |
Definition at line 1883 of file Sanitizer.php.
Referenced by MediaWiki\Specials\SpecialConfirmEmail\execute(), MediaWiki\User\PasswordReset\execute(), MediaWiki\SpecialPage\LoginSignupSpecialPage\getFieldDefinitions(), and MediaWiki\Auth\UserDataAuthenticationRequest\populateUser().
|
static |
Take an array of attribute names and values and normalize or discard illegal values for the given element type.
array | $attribs | |
string | $element |
Check for legal values where the DTD limits things.
Check for unique id attribute :P
Definition at line 530 of file Sanitizer.php.
Referenced by MediaWiki\Parser\RemexRemoveTagHandler\startTag().
const MediaWiki\Parser\Sanitizer::ID_FALLBACK = 1 |
Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured.
Definition at line 90 of file Sanitizer.php.
const MediaWiki\Parser\Sanitizer::ID_PRIMARY = 0 |
Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding.
Definition at line 82 of file Sanitizer.php.