Parsoid
A bidirectional parser between wikitext and HTML5
Parsoid\Utils\Util Class Reference

This file contains general utilities for token transforms. More...

Static Public Member Functions

static stripParsoidIdPrefix (string $aboutId)
 Strip Parsoid id prefix from aboutID. More...
 
static isParsoidObjectId (string $aboutId)
 Check for Parsoid id prefix in an aboutID string. More...
 
static isVoidElement (string $name)
 Determine if the named tag is void (can not have content). More...
 
static clone ( $obj, $deepClone=true)
 deep clones by default. More...
 
static phpURLEncode ( $txt)
 This should not be used. More...
 
static decodeURI (string $s)
 Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone. More...
 
static decodeURIComponent (string $s)
 Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone. More...
 
static extractExtBody (Token $token)
 Extract extension source from the token. More...
 
static isValidDSR (?DomSourceRange $dsr, bool $all=false)
 Check for valid DSR range(s) DSR = "DOM Source Range". More...
 
static normalizeNamespaceName (string $name)
 Cannonicalizes a namespace name. More...
 
static decodeWtEntities (string $text)
 Decode HTML5 entities in wikitext. More...
 
static escapeWtEntities (string $text)
 Entity-escape anything that would decode to a valid wikitext entity. More...
 
static escapeHtml (string $s)
 PORT-FIXME need accurate function description. More...
 
static entityEncodeAll (string $s)
 Encode all characters as entity references. More...
 
static isProtocolValid ( $linkTarget, Env $env)
 Determine whether the protocol of a link is potentially valid. More...
 
static getExtArgInfo (Token $extToken)
 Get argument information for an extension tag token. More...
 
static parseMediaDimensions (string $str, bool $onlyOne=false)
 Parse media dimensions. More...
 
static validateMediaParam (?int $num)
 Validate media parameters More generally, this is defined by the media handler in core. More...
 
static getStar ( $revision)
 FIXME: Is this needed?? More...
 
static magicMasqs ()
 Magic words masquerading as templates. More...
 
static isLinkTrail (string $text)
 Check whether some text is a valid link trail. More...
 
static bcp47n ( $code)
 Convert mediawiki-format language code to a BCP47-compliant language code suitable for including in HTML. More...
 

Public Attributes

const COMMENT_REGEXP_FRAGMENT = '<!--(?>[\s\S]*?-->)'
 Regular expression fragment for matching wikitext comments. More...
 
const COMMENT_REGEXP = '/' . self::COMMENT_REGEXP_FRAGMENT . '/'
 Regular fragment for matching a wikitext comment.
 
const TPL_META_TYPE_REGEXP = '#(?:^|\s)(mw:(?:Transclusion|Param)(?:/End)?)(?=$|\s)#'
 Regexp for checking marker metas typeofs representing transclusion markup or template param markup.
 

Static Public Attributes

static $linkTrailRegex
 This regex was generated by running through all unicode characters and testing them against all regexes for linktrails in a default MW install. More...
 

Detailed Description

This file contains general utilities for token transforms.

Member Function Documentation

◆ bcp47n()

static Parsoid\Utils\Util::bcp47n (   $code)
static

Convert mediawiki-format language code to a BCP47-compliant language code suitable for including in HTML.

See GlobalFunctions.php::wfBCP47() in mediawiki sources.

Parameters
string$codeMediawiki language code.
Returns
string BCP47 language code.

◆ clone()

static Parsoid\Utils\Util::clone (   $obj,
  $deepClone = true 
)
static

deep clones by default.

FIXME, see T161647

Parameters
object | array$objany plain object not tokens or DOM trees
bool$deepClone
Returns
object|array

◆ decodeURI()

static Parsoid\Utils\Util::decodeURI ( string  $s)
static

Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone.

Distinct from decodeURIComponent in that certain escapes are not decoded, matching the behavior of JavaScript's decodeURI().

See also
https://www.ecma-international.org/ecma-262/6.0/#sec-decodeuri-encodeduri
Parameters
string$sURI to be decoded
Returns
string

◆ decodeURIComponent()

static Parsoid\Utils\Util::decodeURIComponent ( string  $s)
static

Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone.

Parameters
string$sURI to be decoded
Returns
string

◆ decodeWtEntities()

static Parsoid\Utils\Util::decodeWtEntities ( string  $text)
static

Decode HTML5 entities in wikitext.

NOTE that wikitext only allows semicolon-terminated entities, while HTML allows a number of "legacy" entities to be decoded without a terminating semicolon. This function deliberately does not decode these HTML-only entity forms.

Parameters
string$text
Returns
string

◆ entityEncodeAll()

static Parsoid\Utils\Util::entityEncodeAll ( string  $s)
static

Encode all characters as entity references.

This is done to make characters safe for wikitext (regardless of whether they are HTML-safe). Typically only called with single-codepoint strings.

Parameters
string$s
Returns
string

◆ escapeHtml()

static Parsoid\Utils\Util::escapeHtml ( string  $s)
static

PORT-FIXME need accurate function description.

Parameters
string$s
Returns
string

◆ escapeWtEntities()

static Parsoid\Utils\Util::escapeWtEntities ( string  $text)
static

Entity-escape anything that would decode to a valid wikitext entity.

Note that HTML5 allows certain "semicolon-less" entities, like &para; these aren't allowed in wikitext and won't be escaped by this function.

Parameters
string$text
Returns
string

◆ extractExtBody()

static Parsoid\Utils\Util::extractExtBody ( Token  $token)
static

Extract extension source from the token.

Parameters
Token$tokentoken
Returns
string

◆ getExtArgInfo()

static Parsoid\Utils\Util::getExtArgInfo ( Token  $extToken)
static

Get argument information for an extension tag token.

Parameters
Token$extToken
Returns

◆ getStar()

static Parsoid\Utils\Util::getStar (   $revision)
static

FIXME: Is this needed??

Extract content in a backwards compatible way

Parameters
object$revision
Returns
object

◆ isLinkTrail()

static Parsoid\Utils\Util::isLinkTrail ( string  $text)
static

Check whether some text is a valid link trail.

Parameters
string$text
Returns
bool

◆ isParsoidObjectId()

static Parsoid\Utils\Util::isParsoidObjectId ( string  $aboutId)
static

Check for Parsoid id prefix in an aboutID string.

Parameters
string$aboutIdaboud ID string
Returns
bool

◆ isProtocolValid()

static Parsoid\Utils\Util::isProtocolValid (   $linkTarget,
Env  $env 
)
static

Determine whether the protocol of a link is potentially valid.

Use the environment's per-wiki config to do so.

Parameters
mixed$linkTarget
Env$env
Returns
bool

◆ isValidDSR()

static Parsoid\Utils\Util::isValidDSR ( ?DomSourceRange  $dsr,
bool  $all = false 
)
static

Check for valid DSR range(s) DSR = "DOM Source Range".

Parameters
DomSourceRange | null$dsrDSR source range values
bool$allAlso check the widths of the container tag
Returns
bool

◆ isVoidElement()

static Parsoid\Utils\Util::isVoidElement ( string  $name)
static

Determine if the named tag is void (can not have content).

Parameters
string$nametag name
Returns
bool

◆ magicMasqs()

static Parsoid\Utils\Util::magicMasqs ( )
static

Magic words masquerading as templates.

Returns
array

◆ normalizeNamespaceName()

static Parsoid\Utils\Util::normalizeNamespaceName ( string  $name)
static

Cannonicalizes a namespace name.

Parameters
string$nameNon-normalized namespace name.
Returns
string

◆ parseMediaDimensions()

static Parsoid\Utils\Util::parseMediaDimensions ( string  $str,
bool  $onlyOne = false 
)
static

Parse media dimensions.

Parameters
string$strmedia dimension string to parse
bool$onlyOneIf set, returns null if multiple dimenstions are present
Returns
array{x:int,y?:int}|null

◆ phpURLEncode()

static Parsoid\Utils\Util::phpURLEncode (   $txt)
static

This should not be used.

Parameters
string$txtURL to encode using PHP encoding
Returns
string

◆ stripParsoidIdPrefix()

static Parsoid\Utils\Util::stripParsoidIdPrefix ( string  $aboutId)
static

Strip Parsoid id prefix from aboutID.

Parameters
string$aboutIdaboud ID string
Returns
string

◆ validateMediaParam()

static Parsoid\Utils\Util::validateMediaParam ( ?int  $num)
static

Validate media parameters More generally, this is defined by the media handler in core.

Parameters
int | null$num
Returns
bool

Member Data Documentation

◆ $linkTrailRegex

Parsoid\Utils\Util::$linkTrailRegex
static
Initial value:
=
'/^[^\0-`{÷ĀĈ-ČĎĐĒĔĖĚĜĝĠ-ĪĬ-įIJĴ-ĹĻ-ĽĿŀŅņʼnŊŌŎŏŒŔŖ-ŘŜŝŠŤŦŨŪ-ŬŮŲ-ŴŶŸ' .
'ſ-ǤǦǨǪ-Ǯǰ-ȗȜ-ȞȠ-ɘɚ-ʑʓ-ʸʽ-̂̄-΅·΋΍΢Ϗ-ЯѐѝѠѢѤѦѨѪѬѮѰѲѴѶѸѺ-ѾҀ-҃҅-ҐҒҔҕҘҚҜ-ҠҤ-ҪҬҭҰҲ' .
'Ҵ-ҶҸҹҼ-ҿӁ-ӗӚ-ӜӞӠ-ӢӤӦӪ-ӲӴӶ-ՠֈ-׏׫-ؠً-ٳٵ-ٽٿ-څڇ-ڗڙ-ڨڪ-ڬڮڰ-ڽڿ-ۅۈ-ۊۍ-۔ۖ-਀਄਋-਎਑਒' .
'਩਱਴਷਺਻਽੃-੆੉੊੎-੘੝੟-੯ੴ-჏ჱ-ẼẾ-\x{200b}\x{200d}-‒—-‗‚‛”--\x{fffd}]+$/D'

This regex was generated by running through all unicode characters and testing them against all regexes for linktrails in a default MW install.

We had to treat it a little bit, here's what we changed:

  1. A-Z, though allowed in Walloon, is disallowed.
  2. '"', though allowed in Chuvash, is disallowed.
  3. '-', though allowed in Icelandic (possibly due to a bug), is disallowed.
  4. '1', though allowed in Lak (possibly due to a bug), is disallowed.

◆ COMMENT_REGEXP_FRAGMENT

const Parsoid\Utils\Util::COMMENT_REGEXP_FRAGMENT = '<!--(?>[\s\S]*?-->)'

Regular expression fragment for matching wikitext comments.

Meant for inclusion in other regular expressions.


The documentation for this class was generated from the following file: