Parsoid
A bidirectional parser between wikitext and HTML5
|
This file contains general utilities for token transforms. More...
Static Public Member Functions | |
static | stripParsoidIdPrefix (string $aboutId) |
Strip Parsoid id prefix from aboutID. | |
static | stripNamespace (string $className) |
Strip PHP namespace from the fully qualified class name. | |
static | isParsoidObjectId (string $aboutId) |
Check for Parsoid id prefix in an aboutID string. | |
static | isVoidElement (string $name) |
Determine if the named tag is void (can not have content). | |
static | clone ( $obj, $deepClone=true) |
deep clones by default. | |
static | lastUniChar (string $str, ?int $idx=null) |
Extract the last unicode character of the string. | |
static | isUniWord (string $s) |
Return true if the first character in $s is a unicode word character. | |
static | phpURLEncode ( $txt) |
This should not be used. | |
static | decodeURI (string $s) |
Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone. | |
static | decodeURIComponent (string $s) |
Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone. | |
static | extractExtBody (Token $token) |
Extract extension source from the token. | |
static | isValidDSR (?DomSourceRange $dsr, bool $all=false) |
Check for valid DSR range(s) DSR = "DOM Source Range". | |
static | normalizeNamespaceName (string $name) |
Cannonicalizes a namespace name. | |
static | decodeWtEntities (string $text) |
Decode HTML5 entities in wikitext. | |
static | escapeWtEntities (string $text) |
Entity-escape anything that would decode to a valid wikitext entity. | |
static | escapeHtml (string $s) |
Convert special characters to HTML entities. | |
static | entityEncodeAll (string $s) |
Encode all characters as entity references. | |
static | isProtocolValid ( $linkTarget, Env $env) |
Determine whether the protocol of a link is potentially valid. | |
static | getExtArgInfo (Token $extToken) |
Get argument information for an extension tag token. | |
static | parseMediaDimensions (string $str, bool $onlyOne=false) |
Parse media dimensions. | |
static | validateMediaParam (?int $num) |
Validate media parameters More generally, this is defined by the media handler in core. | |
static | getStar ( $revision) |
FIXME: Is this needed?? | |
static | magicMasqs () |
FIXME: This feels broken. | |
static | isLinkTrail (string $text) |
Check whether some text is a valid link trail. | |
static | bcp47n ( $code) |
Convert mediawiki-format language code to a BCP47-compliant language code suitable for including in HTML. | |
Public Attributes | |
const | COMMENT_REGEXP_FRAGMENT = '<!--(?>[\s\S]*?-->)' |
Regular expression fragment for matching wikitext comments. | |
const | COMMENT_REGEXP = '/' . self::COMMENT_REGEXP_FRAGMENT . '/' |
Regular fragment for matching a wikitext comment. | |
Static Public Attributes | |
static | $linkTrailRegex |
This regex was generated by running through all unicode characters and testing them against all regexes for linktrails in a default MW install. | |
This file contains general utilities for token transforms.
|
static |
Convert mediawiki-format language code to a BCP47-compliant language code suitable for including in HTML.
See GlobalFunctions.php::wfBCP47()
in mediawiki sources.
string | $code | Mediawiki language code. |
|
static |
deep clones by default.
FIXME, see T161647
object | array | $obj | any plain object not tokens or DOM trees |
bool | $deepClone |
|
static |
Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone.
Distinct from decodeURIComponent
in that certain escapes are not decoded, matching the behavior of JavaScript's decodeURI().
string | $s | URI to be decoded |
|
static |
Percent-decode only valid UTF-8 characters, leaving other encoded bytes alone.
string | $s | URI to be decoded |
|
static |
Decode HTML5 entities in wikitext.
NOTE that wikitext only allows semicolon-terminated entities, while HTML allows a number of "legacy" entities to be decoded without a terminating semicolon. This function deliberately does not decode these HTML-only entity forms.
string | $text |
|
static |
Encode all characters as entity references.
This is done to make characters safe for wikitext (regardless of whether they are HTML-safe). Typically only called with single-codepoint strings.
string | $s |
|
static |
Convert special characters to HTML entities.
string | $s |
|
static |
Entity-escape anything that would decode to a valid wikitext entity.
Note that HTML5 allows certain "semicolon-less" entities, like ¶
; these aren't allowed in wikitext and won't be escaped by this function.
string | $text |
|
static |
Extract extension source from the token.
Token | $token | token |
|
static |
Get argument information for an extension tag token.
Token | $extToken |
|
static |
FIXME: Is this needed??
Extract content in a backwards compatible way
object | $revision |
|
static |
Check whether some text is a valid link trail.
string | $text |
|
static |
Check for Parsoid id prefix in an aboutID string.
string | $aboutId | aboud ID string |
|
static |
Determine whether the protocol of a link is potentially valid.
Use the environment's per-wiki config to do so.
mixed | $linkTarget | |
Env | $env |
|
static |
Return true if the first character in $s is a unicode word character.
string | $s |
|
static |
Check for valid DSR range(s) DSR = "DOM Source Range".
?DomSourceRange | $dsr | DSR source range values |
bool | $all | Also check the widths of the container tag |
|
static |
Determine if the named tag is void (can not have content).
string | $name | tag name |
|
static |
Extract the last unicode character of the string.
This might be more than one byte, if the last character is non-ASCII.
string | $str | |
?int | $idx | The index after the character to extract; defaults to the length of $str, which will extract the last character in $str. |
|
static |
FIXME: This feels broken.
Magic words masquerading as templates.
|
static |
Cannonicalizes a namespace name.
string | $name | Non-normalized namespace name. |
|
static |
Parse media dimensions.
string | $str | media dimension string to parse |
bool | $onlyOne | If set, returns null if multiple dimenstions are present |
|
static |
This should not be used.
string | $txt | URL to encode using PHP encoding |
|
static |
Strip PHP namespace from the fully qualified class name.
string | $className |
|
static |
Strip Parsoid id prefix from aboutID.
string | $aboutId | aboud ID string |
|
static |
Validate media parameters More generally, this is defined by the media handler in core.
?int | $num |
|
static |
This regex was generated by running through all unicode characters and testing them against all regexes for linktrails in a default MW install.
We had to treat it a little bit, here's what we changed:
const Wikimedia\Parsoid\Utils\Utils::COMMENT_REGEXP_FRAGMENT = '<!--(?>[\s\S]*?-->)' |
Regular expression fragment for matching wikitext comments.
Meant for inclusion in other regular expressions.