Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Utils\ContentUtils Class Reference

These utilities are for processing content that's generated by parsing source input (ex: wikitext) More...

Static Public Member Functions

static toXML (Node $node, array $options=[])
 XML Serializer.
 
static ppToXML (Node $node, array $options=[])
 dataobject aware XML serializer, to be used in the DOM post-processing phase.
 
static createDocument (string $html='', bool $validateXMLNames=false)
 XXX: Don't use this outside of testing.
 
static createAndLoadDocument (string $html, array $options=[])
 XXX: Don't use this outside of testing.
 
static createAndLoadDocumentFragment (Document $doc, string $html, array $options=[])
 
static extractDpAndSerialize (Node $node, array $options=[])
 Pull the data-parsoid script element out of the doc before serializing.
 
static stripUnnecessaryWrappersAndSyntheticNodes (Element $node)
 Strip Parsoid-inserted section wrappers, annotation wrappers, and synthetic nodes (fallback id spans with HTML4 ids for headings, auto-generated TOC metas and possibly other such in the future) from the DOM.
 
static processAttributeEmbeddedHTML (ParsoidExtensionAPI $extAPI, Element $elt, Closure $proc)
 Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM.
 
static shiftDSR (Env $env, Node $rootNode, callable $dsrFunc, ParsoidExtensionAPI $extAPI)
 Shift the DOM Source Range (DSR) of a DOM fragment.
 
static convertOffsets (Env $env, Document $doc, string $from, string $to)
 Convert DSR offsets in a Document between utf-8/ucs2/codepoint indices.
 
static dumpDOM (Node $rootNode, string $title='', array $options=[])
 Dump the DOM with attributes.
 

Detailed Description

These utilities are for processing content that's generated by parsing source input (ex: wikitext)

Member Function Documentation

◆ convertOffsets()

static Wikimedia\Parsoid\Utils\ContentUtils::convertOffsets ( Env $env,
Document $doc,
string $from,
string $to )
static

Convert DSR offsets in a Document between utf-8/ucs2/codepoint indices.

Offset types are:

  • 'byte': Bytes (UTF-8 encoding), e.g. PHP substr() or strlen().
  • 'char': Unicode code points (encoding irrelevant), e.g. PHP mb_substr() or mb_strlen().
  • 'ucs2': 16-bit code units (UTF-16 encoding), e.g. JavaScript .substring() or .length.
See also
TokenUtils::convertTokenOffsets for a related function on tokens.
Parameters
Env$env
Document$docThe document to convert
string$fromOffset type to convert from.
string$toOffset type to convert to.

◆ createAndLoadDocument()

static Wikimedia\Parsoid\Utils\ContentUtils::createAndLoadDocument ( string $html,
array $options = [] )
static

XXX: Don't use this outside of testing.

It shouldn't be necessary to create new documents when parsing or serializing. A document lives on the environment which can be used to create fragments. The bag added as a dynamic property to the PHP wrapper around the libxml doc is at risk of being GC-ed.

Parameters
string$html
array$options
Returns
Document

◆ createAndLoadDocumentFragment()

static Wikimedia\Parsoid\Utils\ContentUtils::createAndLoadDocumentFragment ( Document $doc,
string $html,
array $options = [] )
static
Parameters
Document$doc
string$html
array$options
Returns
DocumentFragment

◆ createDocument()

static Wikimedia\Parsoid\Utils\ContentUtils::createDocument ( string $html = '',
bool $validateXMLNames = false )
static

XXX: Don't use this outside of testing.

It shouldn't be necessary to create new documents when parsing or serializing. A document lives on the environment which can be used to create fragments. The bag added as a dynamic property to the PHP wrapper around the libxml doc is at risk of being GC-ed.

Parameters
string$html
bool$validateXMLNames
Returns
Document

◆ dumpDOM()

static Wikimedia\Parsoid\Utils\ContentUtils::dumpDOM ( Node $rootNode,
string $title = '',
array $options = [] )
static

Dump the DOM with attributes.

Parameters
Node$rootNode
string$title
array$optionsAssociative array of options:
  • dumpFragmentMap: Dump the fragment map from env
  • quiet: Suppress separators

storeDataAttribs options:

  • discardDataParsoid
  • keepTmp
  • storeInPageBundle
  • storeDiffMark
  • env
  • idIndex

XMLSerializer options:

  • smartQuote
  • innerXML
  • captureOffsets
  • addDoctype
    Returns
    string The dump result

◆ extractDpAndSerialize()

static Wikimedia\Parsoid\Utils\ContentUtils::extractDpAndSerialize ( Node $node,
array $options = [] )
static

Pull the data-parsoid script element out of the doc before serializing.

Parameters
Node$node
array$optionsXMLSerializer options.
Returns
array

◆ ppToXML()

static Wikimedia\Parsoid\Utils\ContentUtils::ppToXML ( Node $node,
array $options = [] )
static

dataobject aware XML serializer, to be used in the DOM post-processing phase.

Parameters
Node$node
array$options
Returns
string

◆ processAttributeEmbeddedHTML()

static Wikimedia\Parsoid\Utils\ContentUtils::processAttributeEmbeddedHTML ( ParsoidExtensionAPI $extAPI,
Element $elt,
Closure $proc )
static

Extensions might be interested in examining their content embedded in data-mw attributes that don't otherwise show up in the DOM.

Ex: inline media captions that aren't rendered, language variant markup, attributes that are transcluded. More scenarios might be added later.

Parameters
ParsoidExtensionAPI$extAPI
Element$eltThe node whose data attributes need to be examined
Closure$procThe processor that will process the embedded HTML Signature: (string) -> string This processor will be provided the HTML string as input and is expected to return a possibly modified string.

◆ shiftDSR()

static Wikimedia\Parsoid\Utils\ContentUtils::shiftDSR ( Env $env,
Node $rootNode,
callable $dsrFunc,
ParsoidExtensionAPI $extAPI )
static

Shift the DOM Source Range (DSR) of a DOM fragment.

Parameters
Env$env
Node$rootNode
callable$dsrFunc
ParsoidExtensionAPI$extAPI
Returns
Node Returns the $rootNode passed in to allow chaining.

◆ stripUnnecessaryWrappersAndSyntheticNodes()

static Wikimedia\Parsoid\Utils\ContentUtils::stripUnnecessaryWrappersAndSyntheticNodes ( Element $node)
static

Strip Parsoid-inserted section wrappers, annotation wrappers, and synthetic nodes (fallback id spans with HTML4 ids for headings, auto-generated TOC metas and possibly other such in the future) from the DOM.

Parameters
Element$node

◆ toXML()

static Wikimedia\Parsoid\Utils\ContentUtils::toXML ( Node $node,
array $options = [] )
static

XML Serializer.

Parameters
Node$node
array$optionsXMLSerializer options.
Returns
string

The documentation for this class was generated from the following file: