Parsoid
A bidirectional parser between wikitext and HTML5
Parsoid\Utils\WTUtils Class Reference

These utilites pertain to extracting / modifying wikitext information from the DOM. More...

Static Public Member Functions

static hasLiteralHTMLMarker (stdClass $dp)
 Check whether a node's data-parsoid object includes an indicator that the original wikitext was a literal HTML element (like table or p) More...
 
static isLiteralHTMLNode (?DOMNode $node)
 Run a node through hasLiteralHTMLMarker. More...
 
static isZeroWidthWikitextElt (DOMNode $node)
 
static isBlockNodeWithVisibleWT (DOMNode $node)
 Is $node a block node that is also visible in wikitext? An example of an invisible block node is a `. More...
 
static usesWikiLinkSyntax (DOMElement $node, ?stdClass $dp)
 Helper functions to detect when an A-$node uses [[..]]/[..]/... More...
 
static usesExtLinkSyntax (DOMElement $node, ?stdClass $dp)
 Helper function to detect when an A-node uses ext-link syntax. More...
 
static usesURLLinkSyntax (DOMElement $node, stdClass $dp=null)
 Helper function to detect when an A-node uses url-link syntax. More...
 
static usesMagicLinkSyntax (DOMElement $node, stdClass $dp=null)
 Helper function to detect when an A-node uses magic-link syntax. More...
 
static isTplMetaType (string $nType)
 Check whether a meta's typeof indicates that it is a template expansion. More...
 
static hasExpandedAttrsType (DOMElement $node)
 Check whether a typeof indicates that it signifies an expanded attribute. More...
 
static isTplMarkerMeta (DOMNode $node)
 Check whether a node is a meta tag that signifies a template expansion. More...
 
static isTplStartMarkerMeta (DOMNode $node)
 Check whether a node is a meta signifying the start of a template expansion. More...
 
static isTplEndMarkerMeta (DOMNode $node)
 Check whether a node is a meta signifying the end of a template expansion. More...
 
static isNewElt (DOMNode $node)
 This tests whether a DOM $node is a new $node added during an edit session or an existing $node from parsed wikitext. More...
 
static isIndentPre (DOMNode $node)
 Check whether a pre is caused by indentation in the original wikitext. More...
 
static isInlineMedia (DOMNode $node)
 
static isGeneratedFigure (DOMNode $node)
 
static indentPreDSRCorrection (DOMNode $textNode)
 Find how much offset is necessary for the DSR of an indent-originated pre tag. More...
 
static hasParsoidAboutId (DOMNode $node)
 Check if $node is an ELEMENT $node belongs to a template/extension. More...
 
static isRedirectLink (DOMNode $node)
 Does $node represent a redirect link? More...
 
static isCategoryLink (?DOMNode $node)
 Does $node represent a category link? More...
 
static isSolTransparentLink (DOMNode $node)
 Does $node represent a link that is sol-transparent? More...
 
static emitsSolTransparentSingleLineWT (DOMNode $node)
 Check if '$node' emits wikitext that is sol-transparent in wikitext form. More...
 
static isFallbackIdSpan (DOMNode $node)
 This is the span added to headings to add fallback ids for when legacy and HTML5 ids don't match up. More...
 
static isRenderingTransparentNode (DOMNode $node)
 These are primarily 'metadata'-like $nodes that don't show up in output rendering. More...
 
static inHTMLTableTag (DOMNode $node)
 Is $node nested inside a table tag that uses HTML instead of native wikitext? More...
 
static isFirstEncapsulationWrapperNode (DOMNode $node)
 Is $node the first wrapper element of encapsulated content? More...
 
static isEncapsulationWrapper (DOMNode $node)
 Is $node an encapsulation wrapper elt? More...
 
static isDOMFragmentWrapper (DOMNode $node)
 Is $node a DOMFragment wrapper? More...
 
static isSealedFragmentOfType (DOMNode $node, string $type)
 Is $node a sealed DOMFragment of a specific type? More...
 
static isParsoidSectionTag (DOMNode $node)
 Is $node a Parsoid-generated <section> tag? More...
 
static fromExtensionContent (DOMNode $node, string $extType)
 Is the $node from extension content? More...
 
static getWTSource (Frame $frame, DOMElement $node)
 Compute, when possible, the wikitext source for a $node in an environment env. More...
 
static getAboutSiblings (DOMNode $node, string $about)
 Gets all siblings that follow '$node' that have an 'about' as their about id. More...
 
static skipOverEncapsulatedContent (DOMNode $node)
 This function is only intended to be used on encapsulated $nodes (Template/Extension/Param content). More...
 
static encodeComment (string $comment)
 Comment encoding/decoding. More...
 
static decodeComment (string $comment)
 Map an HTML DOM-escaped comment to a wikitext-escaped comment. More...
 
static decodedCommentLength ( $node)
 Utility function: we often need to know the wikitext DSR length for an HTML DOM comment value. More...
 
static escapeNowikiTags (string $text)
 Escape <nowiki> tags. More...
 
static fosterCommentData (string $typeOf, array $attrs, bool $encode)
 Conditional encoding is because, while treebuilding, the value goes directly from token to dom node without the comment itself being stringified and parsed where the comment encoding would be necessary. More...
 
static reinsertFosterableContent (Env $env, DOMNode $node, bool $decode)
 
static getNativeExt (Env $env, DOMNode $node)
 

Public Attributes

const FIRST_ENCAP_REGEXP
 

Detailed Description

These utilites pertain to extracting / modifying wikitext information from the DOM.

Member Function Documentation

◆ decodeComment()

static Parsoid\Utils\WTUtils::decodeComment ( string  $comment)
static

Map an HTML DOM-escaped comment to a wikitext-escaped comment.

Parameters
string$commentDOM-escaped comment.
Returns
string Wikitext-escaped comment.

◆ decodedCommentLength()

static Parsoid\Utils\WTUtils::decodedCommentLength (   $node)
static

Utility function: we often need to know the wikitext DSR length for an HTML DOM comment value.

Parameters
DOMComment | CommentTk | string$nodeA comment node containing a DOM-escaped comment.
Returns
int The wikitext length in UTF-8 bytes necessary to encode this comment, including 7 characters for the `` delimiters.

◆ emitsSolTransparentSingleLineWT()

static Parsoid\Utils\WTUtils::emitsSolTransparentSingleLineWT ( DOMNode  $node)
static

Check if '$node' emits wikitext that is sol-transparent in wikitext form.

This is a test for wikitext that doesn't introduce line breaks.

Comment, whitespace text $nodes, category links, redirect links, behavior switches, and include directives currently satisfy this definition.

This should come close to matching TokenUtils.isSolTransparent()

Parameters
DOMNode$node
Returns
bool

◆ encodeComment()

static Parsoid\Utils\WTUtils::encodeComment ( string  $comment)
static

Comment encoding/decoding.

  • Some relevant phab tickets: T94055, T70146, T60184, T95039

The wikitext comment rule is very simple: ends a comment. This means we can have almost anything as the contents of a comment (except the string "-->", but see below), including several things that are not valid in HTML5 comments:

  • For one, the html5 comment parsing algorithm [0] leniently accepts –!> as a closing comment tag, which differs from the php+tidy combo.
  • If the comment's data matches /^-?>/, html5 will end the comment. For example, breaks up as (as text).
    • Finally, comment data shouldn't contain two consecutive hyphen-minus characters (–), nor end in a hyphen-minus character (/-$/) as defined in the spec [1].

We work around all these problems by using HTML entity encoding inside the comment body. The characters -, >, and & must be encoded in order to prevent premature termination of the comment by one of the cases above. Encoding other characters is optional; all entities will be decoded during wikitext serialization.

In order to allow arbitrary content inside a wikitext comment, including the forbidden string "-->" we also do some minimal entity decoding on the wikitext. We are also limited by our inability to encode DSR attributes on the comment $node, so our wikitext entity decoding must be 1-to-1: that is, there must be a unique "decoded" string for every wikitext sequence, and for every decoded string there must be a unique wikitext which creates it.

The basic idea here is to replace every string ab*c with the string with one more b in it. This creates a string with no instance of "ac", so you can use 'ac' to encode one more code point. In this case a is "--&", "b" is "amp;", and "c" is "gt;" and we use ac to encode "-->" (which is otherwise unspeakable in wikitext).

Note that any user content which does not match the regular expression /–(>|&(amp;)*gt;)/ is unchanged in its wikitext representation, as shown in the first two examples below.

User-authored comment text Wikitext HTML5 DOM


& - > & - > & &#43; > Use > here Use > here Use &gt; here –> –> &#43;&#43;> –> –&gt; &#43;&#43;&gt; –&gt; –&amp;gt; &#43;&#43;&amp;gt;

[0] http://www.w3.org/TR/html5/syntax.html#comment-start-state [1] http://www.w3.org/TR/html5/syntax.html#comments

Map a wikitext-escaped comment to an HTML DOM-escaped comment.

Parameters
string$commentWikitext-escaped comment.
Returns
string DOM-escaped comment.

◆ escapeNowikiTags()

static Parsoid\Utils\WTUtils::escapeNowikiTags ( string  $text)
static

Escape <nowiki> tags.

Parameters
string$text
Returns
string

◆ fosterCommentData()

static Parsoid\Utils\WTUtils::fosterCommentData ( string  $typeOf,
array  $attrs,
bool  $encode 
)
static

Conditional encoding is because, while treebuilding, the value goes directly from token to dom node without the comment itself being stringified and parsed where the comment encoding would be necessary.

Parameters
string$typeOf
array$attrs
bool$encode
Returns
string

◆ fromExtensionContent()

static Parsoid\Utils\WTUtils::fromExtensionContent ( DOMNode  $node,
string  $extType 
)
static

Is the $node from extension content?

Parameters
DOMNode$node
string$extType
Returns
bool

◆ getAboutSiblings()

static Parsoid\Utils\WTUtils::getAboutSiblings ( DOMNode  $node,
string  $about 
)
static

Gets all siblings that follow '$node' that have an 'about' as their about id.

This is used to fetch transclusion/extension content by using the about-id as the key. This works because transclusion/extension content is a forest of dom-trees formed by adjacent dom-nodes. This is the contract that template encapsulation, dom-reuse, and VE code all have to abide by.

The only exception to this adjacency rule is IEW nodes in fosterable positions (in tables) which are not span-wrapped to prevent them from getting fostered out.

Parameters
DOMNode$node
string$about
Returns
DOMNode[]

◆ getNativeExt()

static Parsoid\Utils\WTUtils::getNativeExt ( Env  $env,
DOMNode  $node 
)
static
Parameters
Env$env
DOMNode$node
Returns
?ExtensionTag

◆ getWTSource()

static Parsoid\Utils\WTUtils::getWTSource ( Frame  $frame,
DOMElement  $node 
)
static

Compute, when possible, the wikitext source for a $node in an environment env.

Returns null if the source cannot be extracted.

Parameters
Frame$frame
DOMElement$node
Returns
string|null

◆ hasExpandedAttrsType()

static Parsoid\Utils\WTUtils::hasExpandedAttrsType ( DOMElement  $node)
static

Check whether a typeof indicates that it signifies an expanded attribute.

Parameters
DOMElement$node
Returns
bool

◆ hasLiteralHTMLMarker()

static Parsoid\Utils\WTUtils::hasLiteralHTMLMarker ( stdClass  $dp)
static

Check whether a node's data-parsoid object includes an indicator that the original wikitext was a literal HTML element (like table or p)

Parameters
stdClass$dp
Returns
bool

◆ hasParsoidAboutId()

static Parsoid\Utils\WTUtils::hasParsoidAboutId ( DOMNode  $node)
static

Check if $node is an ELEMENT $node belongs to a template/extension.

NOTE: Use with caution. This technique works reliably for the root level elements of tpl-content DOM subtrees since only they are guaranteed to be marked and nested content might not necessarily be marked.

Parameters
DOMNode$node
Returns
bool

◆ indentPreDSRCorrection()

static Parsoid\Utils\WTUtils::indentPreDSRCorrection ( DOMNode  $textNode)
static

Find how much offset is necessary for the DSR of an indent-originated pre tag.

Parameters
DOMNode$textNode
Returns
int

◆ inHTMLTableTag()

static Parsoid\Utils\WTUtils::inHTMLTableTag ( DOMNode  $node)
static

Is $node nested inside a table tag that uses HTML instead of native wikitext?

Parameters
DOMNode$node
Returns
bool

◆ isBlockNodeWithVisibleWT()

static Parsoid\Utils\WTUtils::isBlockNodeWithVisibleWT ( DOMNode  $node)
static

Is $node a block node that is also visible in wikitext? An example of an invisible block node is a `.

-tag that Parsoid generated, or a

,

tag.

Parameters
DOMNode$node
Returns
bool

◆ isCategoryLink()

static Parsoid\Utils\WTUtils::isCategoryLink ( ?DOMNode  $node)
static

Does $node represent a category link?

Parameters
DOMNode | null$node
Returns
bool

◆ isDOMFragmentWrapper()

static Parsoid\Utils\WTUtils::isDOMFragmentWrapper ( DOMNode  $node)
static

Is $node a DOMFragment wrapper?

Parameters
DOMNode$node
Returns
bool

◆ isEncapsulationWrapper()

static Parsoid\Utils\WTUtils::isEncapsulationWrapper ( DOMNode  $node)
static

Is $node an encapsulation wrapper elt?

All root-level $nodes of generated content are considered encapsulation wrappers and share an about-id.

Parameters
DOMNode$node
Returns
bool

◆ isFallbackIdSpan()

static Parsoid\Utils\WTUtils::isFallbackIdSpan ( DOMNode  $node)
static

This is the span added to headings to add fallback ids for when legacy and HTML5 ids don't match up.

This prevents broken links to legacy ids.

Parameters
DOMNode$node
Returns
bool

◆ isFirstEncapsulationWrapperNode()

static Parsoid\Utils\WTUtils::isFirstEncapsulationWrapperNode ( DOMNode  $node)
static

Is $node the first wrapper element of encapsulated content?

Parameters
DOMNode$node
Returns
bool

◆ isGeneratedFigure()

static Parsoid\Utils\WTUtils::isGeneratedFigure ( DOMNode  $node)
static
Parameters
DOMNode$node
Returns
bool

◆ isIndentPre()

static Parsoid\Utils\WTUtils::isIndentPre ( DOMNode  $node)
static

Check whether a pre is caused by indentation in the original wikitext.

Parameters
DOMNode$node
Returns
bool

◆ isInlineMedia()

static Parsoid\Utils\WTUtils::isInlineMedia ( DOMNode  $node)
static
Parameters
DOMNode$node
Returns
bool

◆ isLiteralHTMLNode()

static Parsoid\Utils\WTUtils::isLiteralHTMLNode ( ?DOMNode  $node)
static

Run a node through hasLiteralHTMLMarker.

Parameters
DOMNode | null$node
Returns
bool

◆ isNewElt()

static Parsoid\Utils\WTUtils::isNewElt ( DOMNode  $node)
static

This tests whether a DOM $node is a new $node added during an edit session or an existing $node from parsed wikitext.

As written, this function can only be used on non-template/extension content or on the top-level $nodes of template/extension content. This test will return the wrong results on non-top-level $nodes of template/extension content.

Parameters
DOMNode$node
Returns
bool

◆ isParsoidSectionTag()

static Parsoid\Utils\WTUtils::isParsoidSectionTag ( DOMNode  $node)
static

Is $node a Parsoid-generated <section> tag?

Parameters
DOMNode$node
Returns
bool

◆ isRedirectLink()

static Parsoid\Utils\WTUtils::isRedirectLink ( DOMNode  $node)
static

Does $node represent a redirect link?

Parameters
DOMNode$node
Returns
bool

◆ isRenderingTransparentNode()

static Parsoid\Utils\WTUtils::isRenderingTransparentNode ( DOMNode  $node)
static

These are primarily 'metadata'-like $nodes that don't show up in output rendering.

  • In Parsoid output, they are represented by link/meta tags.
  • In the PHP parser, they are completely stripped from the input early on. Because of this property, these rendering-transparent $nodes are also SOL-transparent for the purposes of parsing behavior.
Parameters
DOMNode$node
Returns
bool

◆ isSealedFragmentOfType()

static Parsoid\Utils\WTUtils::isSealedFragmentOfType ( DOMNode  $node,
string  $type 
)
static

Is $node a sealed DOMFragment of a specific type?

Parameters
DOMNode$node
string$type
Returns
bool

◆ isSolTransparentLink()

static Parsoid\Utils\WTUtils::isSolTransparentLink ( DOMNode  $node)
static

Does $node represent a link that is sol-transparent?

Parameters
DOMNode$node
Returns
bool

◆ isTplEndMarkerMeta()

static Parsoid\Utils\WTUtils::isTplEndMarkerMeta ( DOMNode  $node)
static

Check whether a node is a meta signifying the end of a template expansion.

Parameters
DOMNode$node
Returns
bool

◆ isTplMarkerMeta()

static Parsoid\Utils\WTUtils::isTplMarkerMeta ( DOMNode  $node)
static

Check whether a node is a meta tag that signifies a template expansion.

Parameters
DOMNode$node
Returns
bool

◆ isTplMetaType()

static Parsoid\Utils\WTUtils::isTplMetaType ( string  $nType)
static

Check whether a meta's typeof indicates that it is a template expansion.

Parameters
string$nType
Returns
bool

◆ isTplStartMarkerMeta()

static Parsoid\Utils\WTUtils::isTplStartMarkerMeta ( DOMNode  $node)
static

Check whether a node is a meta signifying the start of a template expansion.

Parameters
DOMNode$node
Returns
bool

◆ isZeroWidthWikitextElt()

static Parsoid\Utils\WTUtils::isZeroWidthWikitextElt ( DOMNode  $node)
static
Parameters
DOMNode$node
Returns
bool

◆ reinsertFosterableContent()

static Parsoid\Utils\WTUtils::reinsertFosterableContent ( Env  $env,
DOMNode  $node,
bool  $decode 
)
static
Parameters
Env$env
DOMNode$node
bool$decode
Returns
DOMNode|null

◆ skipOverEncapsulatedContent()

static Parsoid\Utils\WTUtils::skipOverEncapsulatedContent ( DOMNode  $node)
static

This function is only intended to be used on encapsulated $nodes (Template/Extension/Param content).

Given a '$node' that has an about-id, it is assumed that it is generated by templates or extensions. This function skips over all following content nodes and returns the first non-template node that follows it.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ usesExtLinkSyntax()

static Parsoid\Utils\WTUtils::usesExtLinkSyntax ( DOMElement  $node,
?stdClass  $dp 
)
static

Helper function to detect when an A-node uses ext-link syntax.

rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types

Parameters
DOMElement$node
stdClass | null$dp
Returns
bool

◆ usesMagicLinkSyntax()

static Parsoid\Utils\WTUtils::usesMagicLinkSyntax ( DOMElement  $node,
stdClass  $dp = null 
)
static

Helper function to detect when an A-node uses magic-link syntax.

rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types

Parameters
DOMElement$node
stdClass | null$dp
Returns
bool

◆ usesURLLinkSyntax()

static Parsoid\Utils\WTUtils::usesURLLinkSyntax ( DOMElement  $node,
stdClass  $dp = null 
)
static

Helper function to detect when an A-node uses url-link syntax.

rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types

Parameters
DOMElement$node
stdClass | null$dp
Returns
bool

◆ usesWikiLinkSyntax()

static Parsoid\Utils\WTUtils::usesWikiLinkSyntax ( DOMElement  $node,
?stdClass  $dp 
)
static

Helper functions to detect when an A-$node uses [[..]]/[..]/...

style syntax (for wikilinks, ext links, url links). rel-type is not sufficient anymore since mw:ExtLink is used for all the three link syntaxes.

Parameters
DOMElement$node
stdClass | null$dp
Returns
bool

Member Data Documentation

◆ FIRST_ENCAP_REGEXP

const Parsoid\Utils\WTUtils::FIRST_ENCAP_REGEXP
Initial value:
=
'#(?:^|\s)(mw:(?:Transclusion|Param|LanguageVariant|Extension(/[^\s]+)))(?=$|\s)#D'

The documentation for this class was generated from the following file: