Parsoid
A bidirectional parser between wikitext and HTML5
Wikimedia\Parsoid\Ext\ParsoidExtensionAPI Class Reference

Extensions are expected to use only these interfaces and strongly discouraged from calling Parsoid code directly. More...

Public Member Functions

 __construct (Env $env, ?array $options=null)
 
 pushError (string $key,... $params)
 Collect errors while parsing. More...
 
 getErrors ()
 
 getTopLevelDoc ()
 Returns the main document we're parsing. More...
 
 newAboutId ()
 Get a new about id for marking extension output FIXME: This should never really be needed since the extension API handles this on behalf of extensions, but Cite has one use case where implicit <references > output is added. More...
 
 getSiteConfig ()
 Get the site configuration to let extensions customize their behavior based on how the wiki is configured. More...
 
 getPageConfig ()
 FIXME: Unsure if we need to provide this access yet Get the page configuration. More...
 
 getTitleUri (Title $title)
 Get the URI to link to a title. More...
 
 getPageUri ()
 Get an URI for the current page. More...
 
 makeTitle (string $str, int $namespaceId)
 Make a title from an input string. More...
 
 inTemplate ()
 Are we parsing in a template context? More...
 
 parentExtTag ()
 FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the name of that extension tag. More...
 
 parentExtTagOpts ()
 FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the parsing options set by that tag. More...
 
 getContentDOM (string $contentId)
 Get the content DOM corresponding to an id. More...
 
 clearContentDOM (string $contentId)
 
 wikitextToDOM (string $wikitext, array $opts, bool $sol)
 Parse wikitext to DOM. More...
 
 extTagToDOM (array $extArgs, string $leadingWS, string $wikitext, array $opts)
 Parse extension tag to DOM. More...
 
 extArgToDOM (array $extArgs, string $key, string $context="inline")
 Process a specific extension arg as wikitext and return its DOM equivalent. More...
 
 extArgsToArray (array $extArgs)
 Convert the ext args representation from an array of KV objects to a plain associative array mapping arg name strings to arg value strings. More...
 
 findAndUpdateArg (array &$extArgs, string $key, ?Closure $updater=null)
 This method finds a requested arg by key name and return its current value. More...
 
 addNewArg (array &$extArgs, string $key, string $value)
 This method adds a new argument to the extension args array. More...
 
 log (... $args)
 Forwards the logging request to the underlying logger. More...
 
 processHiddenHTMLInDataAttributes (Element $elt, Closure $proc)
 Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM. More...
 
 htmlToDom (string $html)
 Parse input string into DOM. More...
 
 domToHtml (Node $node, bool $innerHTML=false, bool $releaseDom=false)
 Serialize DOM element to string (inner/outer HTML is controlled by flag). More...
 
 setHtml2wtStateFlag (string $flag)
 FIXME: This is a bit broken - shouldn't be needed ideally. More...
 
 extStartTagToWikitext (Element $node)
 Emit the opening tag (including attributes) for the extension represented by this node. More...
 
 domToWikitext (array $opts, Element $node, bool $releaseDom=false)
 Convert the input DOM to wikitext. More...
 
 htmlToWikitext (array $opts, string $html)
 Convert the HTML body of an extension to wikitext. More...
 
 domChildrenToWikitext (Element $elt, int $context)
 
 escapeWikitext (string $str, Node $node, int $context)
 Escape any wikitext like constructs in a string so that when the output is parsed, it renders as a string. More...
 
 postProcessDOM (Document $doc)
 EXTAPI-FIXME: We have to figure out what it means to run a DOM PP pass (and what processors and what handlers apply) on content models that are not wikitext. More...
 
 renderMedia (string $titleStr, array $imageOpts, ?string &$error=null, ?bool $forceBlock=false)
 
 addModules (array $modules)
 
 addModuleStyles (array $modulestyles)
 

Static Public Member Functions

static migrateChildrenAndTransferWrapperDataAttribs (Element $from, Element $to)
 Copy $from->childNodes to $to and clone the data attributes of $from to $to. More...
 

Public Attributes

 $extTag
 
const IN_SOL = 1
 Bit flags describing escaping / serializing context in html -> wt mode.
 
const IN_MEDIA = 2
 
const IN_LINK = 4
 
const IN_IMG_CAPTION = 8
 
const IN_OPTION = 16
 

Detailed Description

Extensions are expected to use only these interfaces and strongly discouraged from calling Parsoid code directly.

Code review is expected to catch these discouraged code patterns. We'll have to finish grappling with the extension and hooks API to go down this path seriously. Till then, we'll have extensions leveraging existing code as in the native extension code in this repository.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::__construct ( Env  $env,
?array  $options = null 
)
Parameters
Env$env
?array$options
  • wt2html: used in wt->html direction
    • frame: (Frame)
    • parseOpts: (array)
      • extTag: (string)
      • extTagOpts: (array)
      • inTemplate: (bool)
    • extTag: (ExtensionTag)
  • html2wt: used in html->wt direction
    • state: (SerializerState)

Member Function Documentation

◆ addModules()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addModules ( array  $modules)
Parameters
array$modules

◆ addModuleStyles()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addModuleStyles ( array  $modulestyles)
Parameters
array$modulestyles

◆ addNewArg()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::addNewArg ( array &  $extArgs,
string  $key,
string  $value 
)

This method adds a new argument to the extension args array.

Parameters
KV[]&$extArgs
string$key
string$value

◆ clearContentDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::clearContentDOM ( string  $contentId)
Parameters
string$contentId

◆ domChildrenToWikitext()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domChildrenToWikitext ( Element  $elt,
int  $context 
)
Parameters
Element$elt
int$contextOR-ed bit flags specifying escaping / serialization context
Returns
string

◆ domToHtml()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domToHtml ( Node  $node,
bool  $innerHTML = false,
bool  $releaseDom = false 
)

Serialize DOM element to string (inner/outer HTML is controlled by flag).

If $releaseDom is set to true, the DOM will be left in non-canonical form and is not safe to use after this call. This is primarily a performance optimization.

Parameters
Node$node
bool$innerHTMLif true, inner HTML of the element will be returned This flag defaults to false
bool$releaseDomif true, the DOM will not be in canonical form after this call This flag defaults to false
Returns
string

◆ domToWikitext()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::domToWikitext ( array  $opts,
Element  $node,
bool  $releaseDom = false 
)

Convert the input DOM to wikitext.

Parameters
array$opts
  • extName: (string) Name of the extension whose body we are serializing
  • inPHPBlock: (bool) FIXME: This needs to be removed
Element$nodeDOM to serialize
bool$releaseDomIf $releaseDom is set to true, the DOM will be left in non-canonical form and is not safe to use after this call. This is primarily a performance optimization. This flag defaults to false.
Returns
mixed

◆ escapeWikitext()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::escapeWikitext ( string  $str,
Node  $node,
int  $context 
)

Escape any wikitext like constructs in a string so that when the output is parsed, it renders as a string.

The escaping is sensitive to the context in which the string is embedded. For example, a "*" is not safe at the start of a line (since it will parse as a list item), but is safe if it is not in a start of line context. Similarly the "|" character is safe outside tables, links, and transclusions.

Parameters
string$str
Node$node
int$contextOR-ed bit flags specifying escaping / serialization context
Returns
string

◆ extArgsToArray()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extArgsToArray ( array  $extArgs)

Convert the ext args representation from an array of KV objects to a plain associative array mapping arg name strings to arg value strings.

Parameters
array<KV>$extArgs
Returns
array<string,string>

◆ extArgToDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extArgToDOM ( array  $extArgs,
string  $key,
string  $context = "inline" 
)

Process a specific extension arg as wikitext and return its DOM equivalent.

By default, this method processes the argument value in inline context and normalizes every whitespace character to a single space.

Parameters
KV[]$extArgs
string$keyshould be lower-case
string$context
Returns
?DocumentFragment

◆ extStartTagToWikitext()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extStartTagToWikitext ( Element  $node)

Emit the opening tag (including attributes) for the extension represented by this node.

Parameters
Element$node
Returns
string

◆ extTagToDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::extTagToDOM ( array  $extArgs,
string  $leadingWS,
string  $wikitext,
array  $opts 
)

Parse extension tag to DOM.

Beyond parsing the contents of the extension tag, this wraps the contents in a custom wrapper element (ex:

), sanitizes the arguments of the extension args and sets some content flags on the wrapper.
Parameters
array$extArgs
string$leadingWS
string$wikitext
array$opts
  • srcOffsets
  • frame
  • wrapperTag
  • parseOpts
    • extTag
    • extTagOpts
    • context
Returns
DocumentFragment

◆ findAndUpdateArg()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::findAndUpdateArg ( array &  $extArgs,
string  $key,
?Closure  $updater = null 
)

This method finds a requested arg by key name and return its current value.

If a closure is passed in to update the current value, it is used to update the arg.

Parameters
KV[]&$extArgsArray of extension args
string$keyArgument key whose value needs an update
?Closure$updater $updater will get the existing string value for the arg and is expected to return an updated value.
Returns
?string

◆ getContentDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getContentDOM ( string  $contentId)

Get the content DOM corresponding to an id.

Parameters
string$contentId
Returns
DocumentFragment

◆ getErrors()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getErrors ( )
Returns
array

◆ getPageConfig()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getPageConfig ( )

FIXME: Unsure if we need to provide this access yet Get the page configuration.

Returns
PageConfig

◆ getPageUri()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getPageUri ( )

Get an URI for the current page.

Returns
string

◆ getSiteConfig()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getSiteConfig ( )

Get the site configuration to let extensions customize their behavior based on how the wiki is configured.

Returns
SiteConfig

◆ getTitleUri()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getTitleUri ( Title  $title)

Get the URI to link to a title.

Parameters
Title$title
Returns
string

◆ getTopLevelDoc()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::getTopLevelDoc ( )

Returns the main document we're parsing.

Extension content is parsed to fragments of this document.

Returns
Document

◆ htmlToDom()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::htmlToDom ( string  $html)

Parse input string into DOM.

NOTE: This leaves the DOM in Parsoid-canonical state and is the preferred method to convert HTML to DOM that will be passed into Parsoid's code processing code.

Parameters
string$html
Returns
DocumentFragment

◆ htmlToWikitext()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::htmlToWikitext ( array  $opts,
string  $html 
)

Convert the HTML body of an extension to wikitext.

Parameters
array$opts
  • extName: (string) Name of the extension whose body we are serializing
  • inPHPBlock: (bool) FIXME: This needs to be removed
string$htmlHTML for the extension's body
Returns
string

◆ inTemplate()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::inTemplate ( )

Are we parsing in a template context?

Returns
bool

◆ log()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::log (   $args)

Forwards the logging request to the underlying logger.

Parameters
mixed...$args

◆ makeTitle()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::makeTitle ( string  $str,
int  $namespaceId 
)

Make a title from an input string.

Parameters
string$str
int$namespaceId
Returns
?Title

◆ migrateChildrenAndTransferWrapperDataAttribs()

static Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::migrateChildrenAndTransferWrapperDataAttribs ( Element  $from,
Element  $to 
)
static

Copy $from->childNodes to $to and clone the data attributes of $from to $to.

Parameters
Element$from
Element$to

◆ newAboutId()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::newAboutId ( )

Get a new about id for marking extension output FIXME: This should never really be needed since the extension API handles this on behalf of extensions, but Cite has one use case where implicit <references > output is added.

Returns
string

◆ parentExtTag()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::parentExtTag ( )

FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the name of that extension tag.

Returns
string|null

◆ parentExtTagOpts()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::parentExtTagOpts ( )

FIXME: Is this something that can come from the frame? If we are parsing in the context of a parent extension tag, return the parsing options set by that tag.

Returns
array

◆ postProcessDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::postProcessDOM ( Document  $doc)

EXTAPI-FIXME: We have to figure out what it means to run a DOM PP pass (and what processors and what handlers apply) on content models that are not wikitext.

For now, we are only storing data attribs back to the DOM and adding metadata to the page.

Parameters
Document$doc

◆ processHiddenHTMLInDataAttributes()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::processHiddenHTMLInDataAttributes ( Element  $elt,
Closure  $proc 
)

Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM.

Ex: inline media captions that aren't rendered, language variant markup, attributes that are transcluded. More scenarios might be added later.

Parameters
Element$eltThe node whose data attributes need to be examined
Closure$procThe processor that will process the embedded HTML

◆ pushError()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::pushError ( string  $key,
  $params 
)

Collect errors while parsing.

If processing can't continue, an ExtensionError should be thrown instead.

$key and $params are basically the arguments to wfMessage, although they will be stored in the data-mw of the encapsulation wrapper.

See https://www.mediawiki.org/wiki/Specs/HTML#Error_handling

The returned fragment can be inserted in the dom and will be populated with the localized message. See T266666

@unstable

Parameters
string$key
mixed...$params
Returns
DocumentFragment

◆ renderMedia()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::renderMedia ( string  $titleStr,
array  $imageOpts,
?string &  $error = null,
?bool  $forceBlock = false 
)
Parameters
string$titleStrImage title string
array$imageOptsArray of a mix of strings or arrays, the latter of which can signify that the value came from source. Where, [0] is the fully-constructed image option [1] is the full wikitext source offset for it
?string&$error
?bool$forceBlock
Returns
?Element

◆ setHtml2wtStateFlag()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::setHtml2wtStateFlag ( string  $flag)

FIXME: This is a bit broken - shouldn't be needed ideally.

Parameters
string$flag

◆ wikitextToDOM()

Wikimedia\Parsoid\Ext\ParsoidExtensionAPI::wikitextToDOM ( string  $wikitext,
array  $opts,
bool  $sol 
)

Parse wikitext to DOM.

Parameters
string$wikitext
array$opts
  • srcOffsets
  • frame
  • parseOpts
    • extTag
    • extTagOpts
    • context "inline", "block", etc. Currently, only "inline" is supported
bool$sol
Returns
DocumentFragment

The documentation for this class was generated from the following file: