Parsoid
A bidirectional parser between wikitext and HTML5
|
These utilites pertain to querying / extracting / modifying wikitext information from the DOM. More...
Static Public Member Functions | |
static | hasLiteralHTMLMarker (DataParsoid $dp) |
Check whether a node's data-parsoid object includes an indicator that the original wikitext was a literal HTML element (like table or p) | |
static | isLiteralHTMLNode (?Node $node) |
Run a node through hasLiteralHTMLMarker . | |
static | isZeroWidthWikitextElt (Node $node) |
static | isBlockNodeWithVisibleWT (Node $node) |
Is $node a block node that is also visible in wikitext? An example of an invisible block node is a <p> -tag that Parsoid generated, or a <ul> , <ol> tag. | |
static | isATagFromWikiLinkSyntax (Element $node) |
Helper functions to detect when an A-$node uses [[..]]/[..]/... style syntax (for wikilinks, ext links, url links). | |
static | isATagFromExtLinkSyntax (Element $node) |
Helper function to detect when an A-node uses ext-link syntax. | |
static | isATagFromURLLinkSyntax (Element $node) |
Helper function to detect when an A-node uses url-link syntax. | |
static | isATagFromMagicLinkSyntax (Element $node) |
Helper function to detect when an A-node uses magic-link syntax. | |
static | matchTplType (Element $node) |
Check whether a node's typeof indicates that it is a template expansion. | |
static | hasExpandedAttrsType (Element $node) |
Check whether a typeof indicates that it signifies an expanded attribute. | |
static | isTplMarkerMeta (Node $node) |
Check whether a node is a meta tag that signifies a template expansion. | |
static | isTplStartMarkerMeta (Node $node) |
Check whether a node is a meta signifying the start of a template expansion. | |
static | isTplEndMarkerMeta (Node $node) |
Check whether a node is a meta signifying the end of a template expansion. | |
static | isNewElt (Node $node) |
This tests whether a DOM node is a new node added during an edit session or an existing node from parsed wikitext. | |
static | isIndentPre (Node $node) |
Check whether a pre is caused by indentation in the original wikitext. | |
static | isInlineMedia (Node $node) |
static | isGeneratedFigure (Node $node) |
static | indentPreDSRCorrection (Node $textNode) |
Find how much offset is necessary for the DSR of an indent-originated pre tag. | |
static | hasParsoidAboutId (Node $node) |
Check if $node is an ELEMENT $node belongs to a template/extension. | |
static | isRedirectLink (Node $node) |
Does $node represent a redirect link? | |
static | isCategoryLink (?Node $node) |
Does $node represent a category link? | |
static | isSolTransparentLink (Node $node) |
Does $node represent a link that is sol-transparent? | |
static | emitsSolTransparentSingleLineWT (Node $node) |
Check if '$node' emits wikitext that is sol-transparent in wikitext form. | |
static | isFallbackIdSpan (Node $node) |
This is the span added to headings to add fallback ids for when legacy and HTML5 ids don't match up. | |
static | isRenderingTransparentNode (Node $node) |
These are primarily 'metadata'-like $nodes that don't show up in output rendering. | |
static | inHTMLTableTag (Node $node) |
Is $node nested inside a table tag that uses HTML instead of native wikitext? | |
static | isFirstEncapsulationWrapperNode (Node $node) |
Is $node the first wrapper element of encapsulated content? | |
static | isExtensionOutputingCoreMwDomSpec (Node $node, Env $env) |
Checks whether a first encapsulation wrapper node is encapsulating an extension that outputs Mediawiki Core DOM Spec HTML (https://www.mediawiki.org/wiki/Specs/HTML) | |
static | isEncapsulationWrapper (Node $node) |
Is $node an encapsulation wrapper elt? | |
static | isDOMFragmentWrapper (Node $node) |
Is $node a DOMFragment wrapper? | |
static | isSealedFragmentOfType (Node $node, string $type) |
Is $node a sealed DOMFragment of a specific type? | |
static | isParsoidSectionTag (Node $node) |
Is $node a Parsoid-generated <section> tag? | |
static | fromExtensionContent (Node $node, string $extType) |
Is the $node from extension content? | |
static | fromEncapsulatedContent (Node $node) |
Is $node from encapsulated (template, extension, etc.) content? | |
static | getWTSource (Frame $frame, Element $node) |
Compute, when possible, the wikitext source for a $node in an environment env. | |
static | getAboutSiblings (Node $node, string $about) |
Gets all siblings that follow '$node' that have an 'about' as their about id. | |
static | skipOverEncapsulatedContent (Node $node) |
This function is only intended to be used on encapsulated $nodes (Template/Extension/Param content). | |
static | encodeComment (string $comment) |
Comment encoding/decoding. | |
static | decodeComment (string $comment) |
Map an HTML DOM-escaped comment to a wikitext-escaped comment. | |
static | decodedCommentLength ( $node) |
Utility function: we often need to know the wikitext DSR length for an HTML DOM comment value. | |
static | getNativeExt (Env $env, Node $node) |
static | isIncludeTag (string $name) |
Is this an include directive? | |
static | isAnnOrExtTag (Env $env, string $name) |
Check if tag is annotation or extension directive Adapted from similar grammar function. | |
static | createEmptyLocalizationFragment (Document $doc) |
Creates a DocumentFragment containing a single span with type "mw:I18n". | |
static | createPageContentI18nFragment (Document $doc, string $key, ?array $params=null) |
Creates an internationalization (i18n) message that will be localized into the page content language. | |
static | createInterfaceI18nFragment (Document $doc, string $key, ?array $params=null) |
Creates an internationalization (i18n) message that will be localized into the user interface language. | |
static | createLangI18nFragment (Document $doc, Bcp47Code $lang, string $key, ?array $params=null) |
Creates an internationalization (i18n) message that will be localized into an arbitrary language. | |
static | addPageContentI18nAttribute (Element $element, string $name, string $key, ?array $params=null) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the page content language. | |
static | addInterfaceI18nAttribute (Element $element, string $name, string $key, ?array $params=null) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the user interface language. | |
static | addLangI18nAttribute (Element $element, Bcp47Code $lang, string $name, string $key, ?array $params=null) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the provided language. | |
static | matchAnnotationMeta (Node $node) |
Check whether a node is an annotation meta; if yes, returns its type. | |
static | extractAnnotationType (Node $node, bool &$isStart=false) |
Extract the annotation type, excluding potential "/End" suffix; returns null if not a valid annotation meta. | |
static | isAnnotationStartMarkerMeta (Node $node) |
Check whether a node is a meta signifying the start of an annotated part of the DOM. | |
static | isAnnotationEndMarkerMeta (Node $node) |
Check whether a node is a meta signifying the end of an annotated part of the DOM. | |
static | isMovedMetaTag (Node $node) |
Check whether the meta tag was moved from its initial position. | |
static | isMarkerAnnotation (?Node $n) |
Returns true if a node is a (start or end) annotation meta tag. | |
static | getMediaFormat (Element $node) |
Extracts the media format from the attribute string. | |
static | hasVisibleCaption (Element $node) |
static | textContentFromCaption (Node $node) |
Ref dom post-processing happens after adding media info, so the linkbacks aren't available in the textContent added to the alt. | |
Public Attributes | |
const | ANNOTATION_META_TYPE_REGEXP = '#^mw:(?:Annotation/([\w\d]+))(?:/End)?$#uD' |
Regexp for checking marker metas typeofs representing annotation markup. | |
These utilites pertain to querying / extracting / modifying wikitext information from the DOM.
|
static |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the user interface language.
Element | $element | element on which to add internationalization information |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
?array | $params | parameters for localization |
|
static |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the provided language.
The use of this method is discouraged; ; use ::addPageContentI18nAttribute(...) and ::addInterfaceI18nAttribute(...) where possible rather than, respectively, ::addLangI18nAttribute(..., $wgContLang, ...) and ::addLangI18nAttribute(..., $wgLang, ...).
Element | $element | element on which to add internationalization information |
Bcp47Code | $lang | language in which the message will be localized |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
?array | $params | parameters for localization |
|
static |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the page content language.
Element | $element | element on which to add internationalization information |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
?array | $params | parameters for localization |
|
static |
Creates a DocumentFragment containing a single span with type "mw:I18n".
The created span should be filled in with setDataNodeI18n to be valid.
Document | $doc |
DOMException |
|
static |
Creates an internationalization (i18n) message that will be localized into the user interface language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization.
Document | $doc | |
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
DOMException |
|
static |
Creates an internationalization (i18n) message that will be localized into an arbitrary language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization. The use of this method is discouraged; use ::createPageContentI18nFragment(...) and ::createInterfaceI18nFragment(...) where possible rather than, respectively, ::createLangI18nFragment(..., $wgContLang, ...) and ::createLangI18nFragment(..., $wgLang,...).
Document | $doc | |
Bcp47Code | $lang | language for the localization |
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
DOMException |
|
static |
Creates an internationalization (i18n) message that will be localized into the page content language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization.
Document | $doc | |
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
DOMException |
|
static |
Map an HTML DOM-escaped comment to a wikitext-escaped comment.
string | $comment | DOM-escaped comment. |
|
static |
Utility function: we often need to know the wikitext DSR length for an HTML DOM comment value.
Comment | CommentTk | $node | A comment node containing a DOM-escaped comment. |
<!--
and -->
delimiters.
|
static |
Check if '$node' emits wikitext that is sol-transparent in wikitext form.
This is a test for wikitext that doesn't introduce line breaks.
Comment, whitespace text $nodes, category links, redirect links, behavior switches, and include directives currently satisfy this definition.
This should come close to matching TokenUtils.isSolTransparent()
Node | $node |
|
static |
Comment encoding/decoding.
The wikitext comment rule is very simple: ends a comment. This means we can have almost anything as the contents of a comment (except the string "-->", but see below), including several things that are not valid in HTML5 comments:
We work around all these problems by using HTML entity encoding inside the comment body. The characters -, >, and & must be encoded in order to prevent premature termination of the comment by one of the cases above. Encoding other characters is optional; all entities will be decoded during wikitext serialization.
In order to allow arbitrary content inside a wikitext comment, including the forbidden string "-->" we also do some minimal entity decoding on the wikitext. We are also limited by our inability to encode DSR attributes on the comment $node, so our wikitext entity decoding must be 1-to-1: that is, there must be a unique "decoded" string for every wikitext sequence, and for every decoded string there must be a unique wikitext which creates it.
The basic idea here is to replace every string ab*c with the string with one more b in it. This creates a string with no instance of "ac", so you can use 'ac' to encode one more code point. In this case a is "--&", "b" is "amp;", and "c" is "gt;" and we use ac to encode "-->" (which is otherwise unspeakable in wikitext).
Note that any user content which does not match the regular expression /–(>|&(amp;)*gt;)/ is unchanged in its wikitext representation, as shown in the first two examples below.
User-authored comment text Wikitext HTML5 DOM
& - > & - > & + > Use > here Use > here Use > here --> –> ++> –> –> ++> –> –&gt; ++&gt;
[0] http://www.w3.org/TR/html5/syntax.html#comment-start-state [1] http://www.w3.org/TR/html5/syntax.html#comments
Map a wikitext-escaped comment to an HTML DOM-escaped comment.
string | $comment | Wikitext-escaped comment. |
|
static |
Extract the annotation type, excluding potential "/End" suffix; returns null if not a valid annotation meta.
&$isStart is set to true if the annotation is a start tag, false otherwise.
Node | $node | |
bool | &$isStart |
|
static |
Is $node from encapsulated (template, extension, etc.) content?
Node | $node |
|
static |
Is the $node from extension content?
Node | $node | |
string | $extType |
|
static |
Gets all siblings that follow '$node' that have an 'about' as their about id.
This is used to fetch transclusion/extension content by using the about-id as the key. This works because transclusion/extension content is a forest of dom-trees formed by adjacent dom-nodes. This is the contract that template encapsulation, dom-reuse, and VE code all have to abide by.
The only exception to this adjacency rule is IEW nodes in fosterable positions (in tables) which are not span-wrapped to prevent them from getting fostered out.
Node | $node | |
string | $about |
|
static |
Extracts the media format from the attribute string.
Element | $node |
Env | $env | |
Node | $node |
Compute, when possible, the wikitext source for a $node in an environment env.
Returns null if the source cannot be extracted.
Frame | $frame | |
Element | $node |
|
static |
Check whether a typeof indicates that it signifies an expanded attribute.
Element | $node |
|
static |
Check whether a node's data-parsoid object includes an indicator that the original wikitext was a literal HTML element (like table or p)
DataParsoid | $dp |
|
static |
Check if $node is an ELEMENT $node belongs to a template/extension.
NOTE: Use with caution. This technique works reliably for the root level elements of tpl-content DOM subtrees since only they are guaranteed to be marked and nested content might not necessarily be marked.
Node | $node |
|
static |
Element | $node |
|
static |
Find how much offset is necessary for the DSR of an indent-originated pre tag.
Node | $textNode |
|
static |
Is $node nested inside a table tag that uses HTML instead of native wikitext?
Node | $node |
|
static |
Check if tag is annotation or extension directive Adapted from similar grammar function.
Env | $env | |
string | $name |
|
static |
Check whether a node is a meta signifying the end of an annotated part of the DOM.
Node | $node |
|
static |
Check whether a node is a meta signifying the start of an annotated part of the DOM.
Node | $node |
|
static |
Helper function to detect when an A-node uses ext-link syntax.
rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types
Element | $node |
|
static |
Helper function to detect when an A-node uses magic-link syntax.
rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types
Element | $node |
|
static |
Helper function to detect when an A-node uses url-link syntax.
rel attribute is not sufficient anymore since mw:ExtLink is used for multiple link types
Element | $node |
|
static |
Helper functions to detect when an A-$node uses [[..]]/[..]/... style syntax (for wikilinks, ext links, url links).
rel-type is not sufficient anymore since mw:ExtLink is used for all the three link syntaxes.
Element | $node |
|
static |
Is $node
a block node that is also visible in wikitext? An example of an invisible block node is a <p>
-tag that Parsoid generated, or a <ul>
, <ol>
tag.
Node | $node |
|
static |
Does $node represent a category link?
?Node | $node |
|
static |
Is $node a DOMFragment wrapper?
Node | $node |
|
static |
Is $node an encapsulation wrapper elt?
All root-level $nodes of generated content are considered encapsulation wrappers and share an about-id.
Node | $node |
|
static |
Checks whether a first encapsulation wrapper node is encapsulating an extension that outputs Mediawiki Core DOM Spec HTML (https://www.mediawiki.org/wiki/Specs/HTML)
Node | $node | |
Env | $env |
|
static |
This is the span added to headings to add fallback ids for when legacy and HTML5 ids don't match up.
This prevents broken links to legacy ids.
Node | $node |
|
static |
Is $node the first wrapper element of encapsulated content?
Node | $node |
|
static |
Node | $node |
|
static |
Is this an include directive?
string | $name |
|
static |
Check whether a pre is caused by indentation in the original wikitext.
Node | $node |
|
static |
Node | $node |
|
static |
|
static |
Returns true if a node is a (start or end) annotation meta tag.
?Node | $n |
|
static |
Check whether the meta tag was moved from its initial position.
Node | $node |
|
static |
This tests whether a DOM node is a new node added during an edit session or an existing node from parsed wikitext.
As written, this function can only be used on non-template/extension content or on the top-level nodes of template/extension content. This test will return the wrong results on non-top-level $nodes of template/extension content.
Node | $node |
|
static |
Is $node a Parsoid-generated <section> tag?
Node | $node |
|
static |
Does $node represent a redirect link?
Node | $node |
|
static |
These are primarily 'metadata'-like $nodes that don't show up in output rendering.
Node | $node |
|
static |
Is $node a sealed DOMFragment of a specific type?
Node | $node | |
string | $type |
|
static |
Does $node represent a link that is sol-transparent?
Node | $node |
|
static |
Check whether a node is a meta signifying the end of a template expansion.
Node | $node |
|
static |
Check whether a node is a meta tag that signifies a template expansion.
Node | $node |
|
static |
Check whether a node is a meta signifying the start of a template expansion.
Node | $node |
|
static |
Node | $node |
|
static |
Check whether a node is an annotation meta; if yes, returns its type.
Node | $node |
|
static |
Check whether a node's typeof indicates that it is a template expansion.
Element | $node |
|
static |
This function is only intended to be used on encapsulated $nodes (Template/Extension/Param content).
Given a '$node' that has an about-id, it is assumed that it is generated by templates or extensions. This function skips over all following content nodes and returns the first non-template node that follows it.
Node | $node |
|
static |
Ref dom post-processing happens after adding media info, so the linkbacks aren't available in the textContent added to the alt.
However, when serializing, they are in the caption elements. So, this special handler drops the linkbacks for the purpose of comparison.
Node | $node |