These utilities are for processing content that's generated by parsing source input (ex: wikitext)
More...
|
| static | toXML (Node $node, array $options=[]) |
| | XML Serializer.
|
| |
| static | ppToXML (Node $node, array $options=[]) |
| | dataobject aware XML serializer, to be used in the DOM post-processing phase.
|
| |
| static | createAndLoadDocument (string $html, array $options=[]) |
| | Create a new prepared document with the given HTML and load the data attributes.
|
| |
| static | createAndLoadDocumentFragment (Document $doc, string $html, ?array $options=null) |
| |
| static | stripUnnecessaryWrappersAndSyntheticNodes (Element $node) |
| | Strip Parsoid-inserted section wrappers, annotation wrappers, and synthetic nodes (fallback id spans with HTML4 ids for headings, auto-generated TOC metas and possibly other such in the future) from the DOM.
|
| |
| static | processAttributeEmbeddedDom (SiteConfig $siteConfig, Element $elt, callable $proc) |
| | Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM.
|
| |
| static | shiftDSR (Env $env, Node $rootNode, callable $dsrFunc) |
| | Shift the DOM Source Range (DSR) of a DOM fragment.
|
| |
| static | convertOffsets (Env $env, Document $doc, string $from, string $to) |
| | Convert DSR offsets in a Document between utf-8/ucs2/codepoint indices.
|
| |
| static | dumpDOM (Node $rootNode, string $title='', array $options=[]) |
| | Dump the DOM with attributes.
|
| |
These utilities are for processing content that's generated by parsing source input (ex: wikitext)
◆ convertOffsets()
| static Wikimedia\Parsoid\Utils\ContentUtils::convertOffsets |
( |
Env | $env, |
|
|
Document | $doc, |
|
|
string | $from, |
|
|
string | $to ) |
|
static |
Convert DSR offsets in a Document between utf-8/ucs2/codepoint indices.
Offset types are:
- 'byte': Bytes (UTF-8 encoding), e.g. PHP
substr() or strlen().
- 'char': Unicode code points (encoding irrelevant), e.g. PHP
mb_substr() or mb_strlen().
- 'ucs2': 16-bit code units (UTF-16 encoding), e.g. JavaScript
.substring() or .length.
- See also
- TokenUtils::convertTokenOffsets for a related function on tokens.
- Parameters
-
| Env | $env | |
| Document | $doc | The document to convert |
| string | $from | Offset type to convert from. |
| string | $to | Offset type to convert to. |
◆ createAndLoadDocument()
| static Wikimedia\Parsoid\Utils\ContentUtils::createAndLoadDocument |
( |
string | $html, |
|
|
array | $options = [] ) |
|
static |
Create a new prepared document with the given HTML and load the data attributes.
Don't use this inside of the parser pipeline: it shouldn't be necessary to create new documents when parsing or serializing. A document lives on the environment which can be used to create fragments. The bag added as a dynamic property to the PHP wrapper around the libxml doc is at risk of being GC-ed.
- Parameters
-
| string | $html | |
| array | $options | |
- Returns
- Document
◆ createAndLoadDocumentFragment()
| static Wikimedia\Parsoid\Utils\ContentUtils::createAndLoadDocumentFragment |
( |
Document | $doc, |
|
|
string | $html, |
|
|
?array | $options = null ) |
|
static |
- Parameters
-
| Document | $doc | |
| string | $html | |
| ?array | $options | Not used |
- Returns
- DocumentFragment
◆ dumpDOM()
| static Wikimedia\Parsoid\Utils\ContentUtils::dumpDOM |
( |
Node | $rootNode, |
|
|
string | $title = '', |
|
|
array | $options = [] ) |
|
static |
Dump the DOM with attributes.
- Parameters
-
| Node | $rootNode | |
| string | $title | |
| array | $options | Associative array of options:
- quiet: Suppress separators
|
storeDataAttribs options:
- discardDataParsoid
- keepTmp
- storeInPageBundle
- storeDiffMark
- env
- idIndex
XHtmlSerializer options:
- smartQuote
- innerXML
- captureOffsets
- addDoctype
- Returns
- string The dump result
◆ ppToXML()
| static Wikimedia\Parsoid\Utils\ContentUtils::ppToXML |
( |
Node | $node, |
|
|
array | $options = [] ) |
|
static |
dataobject aware XML serializer, to be used in the DOM post-processing phase.
- Parameters
-
| Node | $node | |
| array | $options | Data attribute options, see DOMDataUtils::storeDataAttribs() for details. In addition, setting ‘$options['fragment’]` to true should be used when serializing a DocumentFragment unconnected to the parent document; this ensures that we don't mistakenly mark the top level document as "unloaded" if we were just serializing a fragment. |
Eventually most places which serialize using the fragment option should be converted to store the DocumentFragment natively, instead of as a string (T348161).
- Returns
- string
◆ processAttributeEmbeddedDom()
| static Wikimedia\Parsoid\Utils\ContentUtils::processAttributeEmbeddedDom |
( |
SiteConfig | $siteConfig, |
|
|
Element | $elt, |
|
|
callable | $proc ) |
|
static |
Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM.
Ex: inline media captions that aren't rendered, language variant markup, attributes that are transcluded. More scenarios might be added later.
- Parameters
-
| SiteConfig | $siteConfig | |
| Element | $elt | The node whose data attributes need to be examined |
| callable(DocumentFragment):bool | $proc The processor that will process the embedded HTML. This processor will be provided a DocumentFragment and is expected to return true if that fragment was modified. |
◆ shiftDSR()
| static Wikimedia\Parsoid\Utils\ContentUtils::shiftDSR |
( |
Env | $env, |
|
|
Node | $rootNode, |
|
|
callable | $dsrFunc ) |
|
static |
Shift the DOM Source Range (DSR) of a DOM fragment.
- Parameters
-
| Env | $env | |
| Node | $rootNode | |
| callable | $dsrFunc | |
◆ stripUnnecessaryWrappersAndSyntheticNodes()
| static Wikimedia\Parsoid\Utils\ContentUtils::stripUnnecessaryWrappersAndSyntheticNodes |
( |
Element | $node | ) |
|
|
static |
Strip Parsoid-inserted section wrappers, annotation wrappers, and synthetic nodes (fallback id spans with HTML4 ids for headings, auto-generated TOC metas and possibly other such in the future) from the DOM.
- Parameters
-
◆ toXML()
| static Wikimedia\Parsoid\Utils\ContentUtils::toXML |
( |
Node | $node, |
|
|
array | $options = [] ) |
|
static |
XML Serializer.
- Parameters
-
| Node | $node | |
| array | $options | XHtmlSerializer options. |
- Returns
- string
The documentation for this class was generated from the following file:
- src/Utils/ContentUtils.php