Parsoid
A bidirectional parser between wikitext and HTML5
|
Extensions are expected to use only these interfaces and strongly discouraged from calling Parsoid code directly. More...
Public Member Functions | |
__construct (Env $env, ?array $options=null) | |
pushError (string $key,... $params) | |
Collect errors while parsing. | |
createInterfaceI18nFragment (string $key, ?array $params) | |
Creates an internationalization (i18n) message that will be localized into the user interface language. | |
createPageContentI18nFragment (string $key, ?array $params) | |
Creates an internationalization (i18n) message that will be localized into the page content language. | |
createLangI18nFragment (Bcp47Code $lang, string $key, ?array $params) | |
Creates an internationalization (i18n) message that will be localized into an arbitrary language. | |
addInterfaceI18nAttribute (Element $element, string $name, string $key, ?array $params) | |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the user interface language. | |
addPageContentI18nAttribute (Element $element, string $name, string $key, array $params) | |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the page content language. | |
addLangI18nAttribute (Element $element, Bcp47Code $lang, string $name, string $key, array $params) | |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the provided language. | |
getErrors () | |
getTopLevelDoc () | |
Returns the main document we're parsing. | |
newAboutId () | |
Get a new about id for marking extension output FIXME: This should never really be needed since the extension API handles this on behalf of extensions, but Cite has one use case where implicit <references > output is added. | |
getSiteConfig () | |
Get the site configuration to let extensions customize their behavior based on how the wiki is configured. | |
getPageConfig () | |
FIXME: Unsure if we need to provide this access yet Get the page configuration. | |
getMetadata () | |
Get the ContentMetadataCollector corresponding to the top-level page. | |
getTitleUri (Title $title) | |
Get the URI to link to a title. | |
getPageUri () | |
Get an URI for the current page. | |
makeTitle (string $str, int $namespaceId) | |
Make a title from an input string. | |
inTemplate () | |
Are we parsing in a template context? | |
isPreview () | |
Are we parsing for a preview? FIXME: Right now, we never do; when we do, this needs to be modified to reflect reality @unstable. | |
parentExtTag () | |
FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the name of that extension tag. | |
parentExtTagOpts () | |
FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the parsing options set by that tag. | |
getContentDOM (string $contentId) | |
Get the content DOM corresponding to an id. | |
clearContentDOM (string $contentId) | |
wikitextToDOM (string $wikitext, array $opts, bool $sol) | |
Parse wikitext to DOM. | |
extTagToDOM (array $extArgs, string $wikitext, array $opts) | |
Parse extension tag to DOM. | |
setTempNodeData (Element $node, $data) | |
Set temporary data into the DOM node that will be discarded when DOM is serialized. | |
getTempNodeData (Element $node, string $key) | |
Get temporary data into the DOM node that will be discarded when DOM is serialized. | |
extArgToDOM (array $extArgs, string $key, string $context="inline") | |
Process a specific extension arg as wikitext and return its DOM equivalent. | |
extArgsToArray (array $extArgs) | |
Convert the ext args representation from an array of KV objects to a plain associative array mapping arg name strings to arg value strings. | |
findAndUpdateArg (array &$extArgs, string $key, ?Closure $updater=null) | |
This method finds a requested arg by key name and return its current value. | |
addNewArg (array &$extArgs, string $key, string $value) | |
This method adds a new argument to the extension args array. | |
log (string $prefix,... $args) | |
Forwards the logging request to the underlying logger. | |
processAttributeEmbeddedHTML (Element $elt, Closure $proc) | |
Extensions might be interested in examining (their) content embedded in data-mw attributes that don't otherwise show up in the DOM. | |
preprocessWikitext (string $wikitext) | |
Equivalent of 'preprocess' from Parser.php in core. | |
htmlToDom (string $html, ?Document $doc=null, ?array $options=[]) | |
Parse input string into DOM. | |
domToHtml (Node $node, bool $innerHTML=false, bool $releaseDom=false) | |
Serialize DOM element to string (inner/outer HTML is controlled by flag). | |
setHtml2wtStateFlag (string $flag) | |
FIXME: This is a bit broken - shouldn't be needed ideally. | |
extStartTagToWikitext (Element $node) | |
Emit the opening tag (including attributes) for the extension represented by this node. | |
domToWikitext (array $opts, Element $node, bool $releaseDom=false) | |
Convert the input DOM to wikitext. | |
htmlToWikitext (array $opts, string $html) | |
Convert the HTML body of an extension to wikitext. | |
getOrigSrc (Element $elt, bool $inner, callable $checkIfOrigSrcReusable) | |
Get the original source for an element. | |
domChildrenToWikitext (Element $elt, int $context) | |
escapeWikitext (string $str, Node $node, int $context) | |
Escape any wikitext like constructs in a string so that when the output is parsed, it renders as a string. | |
postProcessDOM (Document $doc) | |
EXTAPI-FIXME: We have to figure out what it means to run a DOM PP pass (and what processors and what handlers apply) on content models that are not wikitext. | |
renderMedia (string $titleStr, array $imageOpts, ?string &$error=null, ?bool $forceBlock=false, ?bool $suppressMediaFormats=false) | |
Produce the HTML rendering of a title string and media options as the wikitext parser would for a wikilink in the file namespace. | |
serializeMedia (MediaStructure $ms) | |
Serialize a MediaStructure to a title and media options string. | |
addModules (array $modules) | |
addModuleStyles (array $modulestyles) | |
getExternalLinkAttribs (string $url) | |
Get an array of attributes to apply to an anchor linking to $url. | |
Static Public Member Functions | |
static | migrateChildrenAndTransferWrapperDataAttribs (Element $from, Element $to) |
Copy $from->childNodes to $to and clone the data attributes of $from to $to. | |
Public Attributes | |
$extTag | |
const | IN_SOL = 1 |
Bit flags describing escaping / serializing context in html -> wt mode. | |
const | IN_MEDIA = 2 |
const | IN_LINK = 4 |
const | IN_IMG_CAPTION = 8 |
const | IN_OPTION = 16 |
Extensions are expected to use only these interfaces and strongly discouraged from calling Parsoid code directly.
Code review is expected to catch these discouraged code patterns. We'll have to finish grappling with the extension and hooks API to go down this path seriously. Till then, we'll have extensions leveraging existing code as in the native extension code in this repository.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::__construct | ( | Env | $env, |
?array | $options = null ) |
Env | $env | |
?array | $options |
|
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addInterfaceI18nAttribute | ( | Element | $element, |
string | $name, | ||
string | $key, | ||
?array | $params ) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the user interface language.
Element | $element | element on which to add internationalization information |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
?array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addLangI18nAttribute | ( | Element | $element, |
Bcp47Code | $lang, | ||
string | $name, | ||
string | $key, | ||
array | $params ) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the provided language.
The use of this method is discouraged; use ::addPageContentI18nAttribute(...) and ::addInterfaceI18nAttribute(...) where possible rather than, respectively, ::addLangI18nAttribute(..., $wgContLang, ...) and ::addLangI18nAttribute(..., $wgLang, ...).
Element | $element | element on which to add internationalization information |
Bcp47Code | $lang | language in which the attribute will be localized |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addModules | ( | array | $modules | ) |
array | $modules |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addModuleStyles | ( | array | $modulestyles | ) |
array | $modulestyles |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addNewArg | ( | array & | $extArgs, |
string | $key, | ||
string | $value ) |
This method adds a new argument to the extension args array.
KV[] | &$extArgs | |
string | $key | |
string | $value |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addPageContentI18nAttribute | ( | Element | $element, |
string | $name, | ||
string | $key, | ||
array | $params ) |
Adds to $element the internationalization information needed for the attribute $name to be localized in a later pass into the page content language.
Element | $element | element on which to add internationalization information |
string | $name | name of the attribute whose value will be localized |
string | $key | message key used for the attribute value localization |
array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::createInterfaceI18nFragment | ( | string | $key, |
?array | $params ) |
Creates an internationalization (i18n) message that will be localized into the user interface language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization.
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::createLangI18nFragment | ( | Bcp47Code | $lang, |
string | $key, | ||
?array | $params ) |
Creates an internationalization (i18n) message that will be localized into an arbitrary language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization. The use of this method is discouraged; use ::createPageContentI18nFragment(...) and ::createInterfaceI18nFragment(...) where possible rather than, respectively, ::createLangI18nFragment($wgContLang, ...) and ::createLangI18nFragment($wgLang, ...).
Bcp47Code | $lang | language in which the message will be localized |
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::createPageContentI18nFragment | ( | string | $key, |
?array | $params ) |
Creates an internationalization (i18n) message that will be localized into the page content language.
The returned DocumentFragment contains, as a single child, a span element with the appropriate information for later localization.
string | $key | message key for the message to be localized |
?array | $params | parameters for localization |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domChildrenToWikitext | ( | Element | $elt, |
int | $context ) |
Element | $elt | |
int | $context | OR-ed bit flags specifying escaping / serialization context |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domToHtml | ( | Node | $node, |
bool | $innerHTML = false, | ||
bool | $releaseDom = false ) |
Serialize DOM element to string (inner/outer HTML is controlled by flag).
If $releaseDom is set to true, the DOM will be left in non-canonical form and is not safe to use after this call. This is primarily a performance optimization.
Node | $node | |
bool | $innerHTML | if true, inner HTML of the element will be returned This flag defaults to false |
bool | $releaseDom | if true, the DOM will not be in canonical form after this call This flag defaults to false |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domToWikitext | ( | array | $opts, |
Element | $node, | ||
bool | $releaseDom = false ) |
Convert the input DOM to wikitext.
array | $opts |
|
Element | $node | DOM to serialize |
bool | $releaseDom | If $releaseDom is set to true, the DOM will be left in non-canonical form and is not safe to use after this call. This is primarily a performance optimization. This flag defaults to false. |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::escapeWikitext | ( | string | $str, |
Node | $node, | ||
int | $context ) |
Escape any wikitext like constructs in a string so that when the output is parsed, it renders as a string.
The escaping is sensitive to the context in which the string is embedded. For example, a "*" is not safe at the start of a line (since it will parse as a list item), but is safe if it is not in a start of line context. Similarly the "|" character is safe outside tables, links, and transclusions.
string | $str | |
Node | $node | |
int | $context | OR-ed bit flags specifying escaping / serialization context |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extArgsToArray | ( | array | $extArgs | ) |
Convert the ext args representation from an array of KV objects to a plain associative array mapping arg name strings to arg value strings.
array<KV> | $extArgs |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extArgToDOM | ( | array | $extArgs, |
string | $key, | ||
string | $context = "inline" ) |
Process a specific extension arg as wikitext and return its DOM equivalent.
By default, this method processes the argument value in inline context and normalizes every whitespace character to a single space.
KV[] | $extArgs | |
string | $key | should be lower-case |
string | $context |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extStartTagToWikitext | ( | Element | $node | ) |
Emit the opening tag (including attributes) for the extension represented by this node.
Element | $node |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extTagToDOM | ( | array | $extArgs, |
string | $wikitext, | ||
array | $opts ) |
Parse extension tag to DOM.
If a wrapper tag is requested, beyond parsing the contents of the extension tag, this method wraps the contents in a custom wrapper element (ex:
array | $extArgs | Args sanitized and applied to wrapper |
string | $wikitext | Wikitext content of the tag |
array | $opts |
|
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::findAndUpdateArg | ( | array & | $extArgs, |
string | $key, | ||
?Closure | $updater = null ) |
This method finds a requested arg by key name and return its current value.
If a closure is passed in to update the current value, it is used to update the arg.
KV[] | &$extArgs | Array of extension args |
string | $key | Argument key whose value needs an update |
?Closure | $updater | $updater will get the existing string value for the arg and is expected to return an updated value. |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getContentDOM | ( | string | $contentId | ) |
Get the content DOM corresponding to an id.
string | $contentId |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getMetadata | ( | ) |
Get the ContentMetadataCollector corresponding to the top-level page.
In Parsoid integrated mode this will typically be an instance of core's ParserOutput
class.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getOrigSrc | ( | Element | $elt, |
bool | $inner, | ||
callable | $checkIfOrigSrcReusable ) |
Get the original source for an element.
The callable, $checkIfOrigSrcReusable, is used to determine if the $elt is unedited and therefore valid to reuse source. This is assumed to be pretty specific to the callsite so no default is provided.
Element | $elt | |
bool | $inner | |
callable | $checkIfOrigSrcReusable |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getPageConfig | ( | ) |
FIXME: Unsure if we need to provide this access yet Get the page configuration.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getPageUri | ( | ) |
Get an URI for the current page.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getSiteConfig | ( | ) |
Get the site configuration to let extensions customize their behavior based on how the wiki is configured.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getTempNodeData | ( | Element | $node, |
string | $key ) |
Get temporary data into the DOM node that will be discarded when DOM is serialized.
This should only be used when the ExtensionTag is not available; otherwise access the newly created data directly.
Element | $node | |
string | $key | to access TmpData |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getTitleUri | ( | Title | $title | ) |
Get the URI to link to a title.
Title | $title |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getTopLevelDoc | ( | ) |
Returns the main document we're parsing.
Extension content is parsed to fragments of this document.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::htmlToDom | ( | string | $html, |
?Document | $doc = null, | ||
?array | $options = [] ) |
Parse input string into DOM.
NOTE: This leaves the DOM in Parsoid-canonical state and is the preferred method to convert HTML to DOM that will be passed into Parsoid's processing code.
string | $html | |
?Document | $doc | XXX You probably don't want to be doing this |
?array | $options |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::htmlToWikitext | ( | array | $opts, |
string | $html ) |
Convert the HTML body of an extension to wikitext.
array | $opts |
|
string | $html | HTML for the extension's body |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::inTemplate | ( | ) |
Are we parsing in a template context?
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::isPreview | ( | ) |
Are we parsing for a preview? FIXME: Right now, we never do; when we do, this needs to be modified to reflect reality @unstable.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::log | ( | string | $prefix, |
$args ) |
Forwards the logging request to the underlying logger.
string | $prefix | |
mixed | ...$args |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::makeTitle | ( | string | $str, |
int | $namespaceId ) |
Make a title from an input string.
string | $str | |
int | $namespaceId |
|
static |
Copy $from->childNodes to $to and clone the data attributes of $from to $to.
Element | $from | |
Element | $to |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::newAboutId | ( | ) |
Get a new about id for marking extension output FIXME: This should never really be needed since the extension API handles this on behalf of extensions, but Cite has one use case where implicit <references > output is added.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::parentExtTag | ( | ) |
FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the name of that extension tag.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::parentExtTagOpts | ( | ) |
FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the parsing options set by that tag.
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::postProcessDOM | ( | Document | $doc | ) |
EXTAPI-FIXME: We have to figure out what it means to run a DOM PP pass (and what processors and what handlers apply) on content models that are not wikitext.
For now, we are only storing data attribs back to the DOM and adding metadata to the page.
Document | $doc |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::preprocessWikitext | ( | string | $wikitext | ) |
Equivalent of 'preprocess' from Parser.php in core.
string | $wikitext |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::processAttributeEmbeddedHTML | ( | Element | $elt, |
Closure | $proc ) |
Extensions might be interested in examining (their) content embedded in data-mw attributes that don't otherwise show up in the DOM.
Ex: inline media captions that aren't rendered, language variant markup, attributes that are transcluded. More scenarios might be added later.
Element | $elt | The node whose data attributes need to be examined |
Closure | $proc | The processor that will process the embedded HTML Signature: (string) -> string This processor will be provided the HTML string as input and is expected to return a possibly modified string. |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::pushError | ( | string | $key, |
$params ) |
Collect errors while parsing.
If processing can't continue, an ExtensionError should be thrown instead.
$key and $params are basically the arguments to wfMessage, although they will be stored in the data-mw of the encapsulation wrapper.
See https://www.mediawiki.org/wiki/Specs/HTML#Error_handling
The returned fragment can be inserted in the dom and will be populated with the localized message. See T266666
@unstable
string | $key | |
mixed | ...$params |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::renderMedia | ( | string | $titleStr, |
array | $imageOpts, | ||
?string & | $error = null, | ||
?bool | $forceBlock = false, | ||
?bool | $suppressMediaFormats = false ) |
Produce the HTML rendering of a title string and media options as the wikitext parser would for a wikilink in the file namespace.
string | $titleStr | Image title string |
array | $imageOpts | Array of a mix of strings or arrays, the latter of which can signify that the value came from source. Where, [0] is the fully-constructed image option [1] is the full wikitext source offset for it |
?string | &$error | Error string is set when the return is null. |
?bool | $forceBlock | Forces the media to be rendered in a figure as opposed to a span. |
?bool | $suppressMediaFormats | If any media format is present in $imageOpts, it won't be applied and will result in a linting error. |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::serializeMedia | ( | MediaStructure | $ms | ) |
Serialize a MediaStructure to a title and media options string.
The converse to ::renderMedia.
MediaStructure | $ms |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::setHtml2wtStateFlag | ( | string | $flag | ) |
FIXME: This is a bit broken - shouldn't be needed ideally.
string | $flag |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::setTempNodeData | ( | Element | $node, |
$data ) |
Set temporary data into the DOM node that will be discarded when DOM is serialized.
Use the tag name as the key for TempData management
Element | $node | |
mixed | $data |
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::wikitextToDOM | ( | string | $wikitext, |
array | $opts, | ||
bool | $sol ) |
Parse wikitext to DOM.
string | $wikitext | |
array | $opts |
|
bool | $sol | Whether tokens should be processed in start-of-line context. |