Parsoid
A bidirectional parser between wikitext and HTML5
Wikimedia\Parsoid\Core\Sanitizer Class Reference

Static Public Member Functions

static attributesAllowedInternal (string $element)
 Fetch the list of acceptable attributes for a given element name. More...
 
static normalizeCharReferences ( $text)
 Ensure that any entities and character references are legal for XML and XHTML specifically. More...
 
static cleanUrl (SiteConfig $siteConfig, string $href, string $mode)
 
static decodeCharReferences (string $text)
 Decode any character references, numeric or named entities, in the text and return a UTF-8 string. More...
 
static normalizeCss (string $value)
 Normalize CSS into a format we can easily search for hostile input. More...
 
static isReservedDataAttribute (string $attr)
 Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code. More...
 
static sanitizeTagAttrs (SiteConfig $siteConfig, ?string $tagName, ?Token $token, array $attrs)
 
static applySanitizedArgs (SiteConfig $siteConfig, Element $wrapper, array $attrs)
 Sanitize and apply attributes to a wrapper element. More...
 
static checkCss (string $text)
 
static cssDecodeCallback ( $matches)
 
static sanitizeTitleURI (string $title, bool $isInterwiki=false)
 Sanitize a title to be used in a URI? More...
 
static armorFrenchSpaces ( $text, $space=' ')
 Armor French spaces with a replacement character. More...
 
static escapeIdForAttribute (string $id, $mode=self::ID_PRIMARY)
 Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute. More...
 
static escapeIdForLink (string $id)
 Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment. More...
 
static escapeIdReferenceList (string $referenceString)
 Given a string containing a space delimited list of ids, escape each id to match ids escaped by the escapeIdForAttribute() function. More...
 
static normalizeSectionIdWhiteSpace (string $id)
 

Public Attributes

const ID_FALLBACK = 1
 Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured. More...
 
const FIXTAGS
 

Member Function Documentation

◆ applySanitizedArgs()

static Wikimedia\Parsoid\Core\Sanitizer::applySanitizedArgs ( SiteConfig  $siteConfig,
Element  $wrapper,
array  $attrs 
)
static

Sanitize and apply attributes to a wrapper element.

Used primarily when we're applying tokenized attributes directly to dom elements, which wouldn't have had a chance to be sanitized before tree building.

Parameters
SiteConfig$siteConfig
Element$wrapperwrapper
array$attrsattributes

◆ armorFrenchSpaces()

static Wikimedia\Parsoid\Core\Sanitizer::armorFrenchSpaces (   $text,
  $space = ' ' 
)
static

Armor French spaces with a replacement character.

Since
1.32
Parameters
string$textText to armor
string$spaceSpace character for the French spaces, defaults to ' '
Returns
string Armored text

◆ attributesAllowedInternal()

static Wikimedia\Parsoid\Core\Sanitizer::attributesAllowedInternal ( string  $element)
static

Fetch the list of acceptable attributes for a given element name.

Parameters
string$element
Returns
array

◆ checkCss()

static Wikimedia\Parsoid\Core\Sanitizer::checkCss ( string  $text)
static
Parameters
string$text
Returns
string

◆ cleanUrl()

static Wikimedia\Parsoid\Core\Sanitizer::cleanUrl ( SiteConfig  $siteConfig,
string  $href,
string  $mode 
)
static
Parameters
SiteConfig$siteConfig
string$href
string$mode
Returns
string|null

◆ cssDecodeCallback()

static Wikimedia\Parsoid\Core\Sanitizer::cssDecodeCallback (   $matches)
static
Parameters
array$matches
Returns
string

◆ decodeCharReferences()

static Wikimedia\Parsoid\Core\Sanitizer::decodeCharReferences ( string  $text)
static

Decode any character references, numeric or named entities, in the text and return a UTF-8 string.

Parameters
string$text
Returns
string

◆ escapeIdForAttribute()

static Wikimedia\Parsoid\Core\Sanitizer::escapeIdForAttribute ( string  $id,
  $mode = self::ID_PRIMARY 
)
static

Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid HTML id attribute.

WARNING: unlike escapeId(), the output of this function is not guaranteed to be HTML safe, be sure to use proper escaping.

In Parsoid, proper escaping is usually handled for us by the HTML serialization algorithm, but be careful of corner cases (such as emitting attributes in wikitext).

Parameters
string$idString to escape
int$modeOne of ID_* constants, specifying whether the primary or fallback encoding should be used.
Returns
string Escaped ID
Since
1.30

◆ escapeIdForLink()

static Wikimedia\Parsoid\Core\Sanitizer::escapeIdForLink ( string  $id)
static

Given a section name or other user-generated or otherwise unsafe string, escapes it to be a valid URL fragment.

WARNING: unlike escapeId(), the output of this function is not guaranteed to be HTML safe, be sure to use proper escaping.

Parameters
string$idString to escape
Returns
string Escaped ID
Since
1.30

◆ escapeIdReferenceList()

static Wikimedia\Parsoid\Core\Sanitizer::escapeIdReferenceList ( string  $referenceString)
static

Given a string containing a space delimited list of ids, escape each id to match ids escaped by the escapeIdForAttribute() function.

Since
1.27
Parameters
string$referenceStringSpace delimited list of ids
Returns
string

◆ isReservedDataAttribute()

static Wikimedia\Parsoid\Core\Sanitizer::isReservedDataAttribute ( string  $attr)
static

Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code.

Parameters
string$attrAttribute name.
Returns
bool

◆ normalizeCharReferences()

static Wikimedia\Parsoid\Core\Sanitizer::normalizeCharReferences (   $text)
static

Ensure that any entities and character references are legal for XML and XHTML specifically.

Any stray bits will be &-escaped to result in a valid text fragment.

a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use lower cased "&#x", not "&#X" d. fix or reject non-valid attributes

Parameters
string$text
Returns
string

◆ normalizeCss()

static Wikimedia\Parsoid\Core\Sanitizer::normalizeCss ( string  $value)
static

Normalize CSS into a format we can easily search for hostile input.

  • decode character references
  • decode escape sequences
  • convert characters that IE6 interprets into ascii
  • remove comments, unless the entire value is one single comment
    Parameters
    string$valuethe css string
    Returns
    string normalized css

◆ normalizeSectionIdWhiteSpace()

static Wikimedia\Parsoid\Core\Sanitizer::normalizeSectionIdWhiteSpace ( string  $id)
static
Parameters
string$id
Returns
string

◆ sanitizeTagAttrs()

static Wikimedia\Parsoid\Core\Sanitizer::sanitizeTagAttrs ( SiteConfig  $siteConfig,
?string  $tagName,
?Token  $token,
array  $attrs 
)
static
Parameters
SiteConfig$siteConfig
?string$tagName
?Token$token
array$attrs
Returns
array

◆ sanitizeTitleURI()

static Wikimedia\Parsoid\Core\Sanitizer::sanitizeTitleURI ( string  $title,
bool  $isInterwiki = false 
)
static

Sanitize a title to be used in a URI?

Parameters
string$title
bool$isInterwiki
Returns
string

Member Data Documentation

◆ FIXTAGS

const Wikimedia\Parsoid\Core\Sanitizer::FIXTAGS
Initial value:
= [
# French spaces, last one Guillemet-left
# only if it isn't followed by a word character.
'/ (?=[?:;!%»›](?!\w))/u' => "%s",
# French spaces, Guillemet-right
'/([«‹]) /u' => "\\1%s",
]

◆ ID_FALLBACK

const Wikimedia\Parsoid\Core\Sanitizer::ID_FALLBACK = 1

Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured.

Since
1.30

The documentation for this class was generated from the following file: