Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Core\Sanitizer Class Reference

Static Public Member Functions

static attributesAllowedInternal (string $element)
 Fetch the list of acceptable attributes for a given element name.
 
static normalizeCharReferences (string $text)
 Ensure that any entities and character references are legal for XML and XHTML specifically.
 
static cleanUrl (SiteConfig $siteConfig, string $href, string $mode)
 
static decodeCharReferences (string $text)
 Decode any character references, numeric or named entities, in the text and return a UTF-8 string.
 
static normalizeCss (string $value)
 Normalize CSS into a format we can easily search for hostile input.
 
static isReservedDataAttribute (string $attr)
 Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code.
 
static sanitizeTagAttrs (SiteConfig $siteConfig, ?string $tagName, ?Token $token, array $attrs)
 
static applySanitizedArgs (SiteConfig $siteConfig, Element $wrapper, array $attrs)
 Sanitize and apply attributes to a wrapper element.
 
static checkCss (string $text)
 
static cssDecodeCallback (array $matches)
 
static sanitizeTitleURI (string $title, bool $isInterwiki=false)
 Sanitize a title to be used in a URI?
 

Public Attributes

const ID_PRIMARY = 0
 Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding.
 
const ID_FALLBACK = 1
 Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured.
 

Member Function Documentation

◆ applySanitizedArgs()

static Wikimedia\Parsoid\Core\Sanitizer::applySanitizedArgs ( SiteConfig $siteConfig,
Element $wrapper,
array $attrs )
static

Sanitize and apply attributes to a wrapper element.

Used primarily when we're applying tokenized attributes directly to dom elements, which wouldn't have had a chance to be sanitized before tree building.

Parameters
SiteConfig$siteConfig
Element$wrapperwrapper
array$attrsattributes

◆ attributesAllowedInternal()

static Wikimedia\Parsoid\Core\Sanitizer::attributesAllowedInternal ( string $element)
static

Fetch the list of acceptable attributes for a given element name.

Parameters
string$element
Returns
array<string,int>

◆ checkCss()

static Wikimedia\Parsoid\Core\Sanitizer::checkCss ( string $text)
static
Parameters
string$text
Returns
string

◆ cleanUrl()

static Wikimedia\Parsoid\Core\Sanitizer::cleanUrl ( SiteConfig $siteConfig,
string $href,
string $mode )
static
Parameters
SiteConfig$siteConfig
string$href
string$mode
Returns
string|null

◆ cssDecodeCallback()

static Wikimedia\Parsoid\Core\Sanitizer::cssDecodeCallback ( array $matches)
static
Parameters
array$matches
Returns
string

◆ decodeCharReferences()

static Wikimedia\Parsoid\Core\Sanitizer::decodeCharReferences ( string $text)
static

Decode any character references, numeric or named entities, in the text and return a UTF-8 string.

Parameters
string$text
Returns
string

◆ isReservedDataAttribute()

static Wikimedia\Parsoid\Core\Sanitizer::isReservedDataAttribute ( string $attr)
static

Given an attribute name, checks whether it is a reserved data attribute (such as data-mw-foo) which is unavailable to user-generated HTML so MediaWiki core and extension code can safely use it to communicate with frontend code.

Parameters
string$attrAttribute name.
Returns
bool

◆ normalizeCharReferences()

static Wikimedia\Parsoid\Core\Sanitizer::normalizeCharReferences ( string $text)
static

Ensure that any entities and character references are legal for XML and XHTML specifically.

Any stray bits will be &-escaped to result in a valid text fragment.

a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use lower cased "&#x", not "&#X" d. fix or reject non-valid attributes

Parameters
string$text
Returns
string

◆ normalizeCss()

static Wikimedia\Parsoid\Core\Sanitizer::normalizeCss ( string $value)
static

Normalize CSS into a format we can easily search for hostile input.

  • decode character references
  • decode escape sequences
  • convert characters that IE6 interprets into ascii
  • remove comments, unless the entire value is one single comment
    Parameters
    string$valuethe css string
    Returns
    string normalized css

◆ sanitizeTagAttrs()

static Wikimedia\Parsoid\Core\Sanitizer::sanitizeTagAttrs ( SiteConfig $siteConfig,
?string $tagName,
?Token $token,
array $attrs )
static
Parameters
SiteConfig$siteConfig
?string$tagName
?Token$token
array$attrs
Returns
array

◆ sanitizeTitleURI()

static Wikimedia\Parsoid\Core\Sanitizer::sanitizeTitleURI ( string $title,
bool $isInterwiki = false )
static

Sanitize a title to be used in a URI?

Parameters
string$title
bool$isInterwiki
Returns
string

Member Data Documentation

◆ ID_FALLBACK

const Wikimedia\Parsoid\Core\Sanitizer::ID_FALLBACK = 1

Tells escapeUrlForHtml() to encode the ID using the fallback encoding, or return false if no fallback is configured.

Since
1.30

◆ ID_PRIMARY

const Wikimedia\Parsoid\Core\Sanitizer::ID_PRIMARY = 0

Tells escapeUrlForHtml() to encode the ID using the wiki's primary encoding.

Since
1.30

The documentation for this class was generated from the following file: