Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Utils\DOMUtils Class Reference

DOM utilities for querying the DOM. More...

Static Public Member Functions

static parseHTML (string $html, bool $validateXMLNames=false)
 Parse HTML, return the tree.
 
static visitDOM (Node $node, callable $handler,... $args)
 This is a simplified version of the DOMTraverser.
 
static migrateChildren (Node $from, Node $to, ?Node $beforeNode=null)
 Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end.
 
static migrateChildrenBetweenDocs (Node $from, Node $to, ?Node $beforeNode=null)
 Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents.
 
static assertElt (?Node $node)
 Assert that this is a DOM element node.
 
static isRemexBlockNode (?Node $node)
 
static isWikitextBlockNode (?Node $node)
 
static isFormattingElt (?Node $node)
 Determine whether this is a formatting DOM element.
 
static isQuoteElt (?Node $node)
 Determine whether this is a quote DOM element.
 
static isBody (?Node $node)
 Determine whether this is the <body> DOM element.
 
static isRemoved (?Node $node)
 Determine whether this is a removed DOM node but Node object yet.
 
static pathToRoot (Node $node)
 Build path from a node to the root of the document.
 
static nodeDepth (Node $node)
 Compute the edge length of the path from $node to the root.
 
static pathToSibling (Node $node, Node $sibling, bool $left)
 Build path from a node to its passed-in sibling.
 
static inSiblingOrder (Node $n1, Node $n2)
 Check whether a node n1 comes before another node n2 in their parent's children list.
 
static isAncestorOf (Node $n1, Node $n2)
 Check that a node 'n1' is an ancestor of another node 'n2' in the DOM.
 
static findAncestorOfName (Node $node, string $name)
 Find an ancestor of $node with nodeName $name.
 
static hasNameOrHasAncestorOfName (Node $node, string $name)
 Check whether $node has $name or has an ancestor named $name.
 
static matchNameAndTypeOf (Node $n, string $name, string $typeRe)
 Determine whether the node matches the given nodeName and attribute value.
 
static hasNameAndTypeOf (Node $n, string $name, string $type)
 Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string.
 
static matchTypeOf (Node $n, string $typeRe)
 Determine whether the node matches the given typeof attribute value.
 
static matchRel (Node $n, string $relRe)
 Determine whether the node matches the given rel attribute value.
 
static hasTypeOf (Node $n, string $type)
 Determine whether the node matches the given typeof attribute value.
 
static hasRel (Node $n, string $rel)
 Determine whether the node matches the given rel attribute value.
 
static hasClass (Element $element, string $regex)
 
static addTypeOf (Element $node, string $type, bool $prepend=false)
 Add a type to the typeof attribute.
 
static addRel (Element $node, string $rel)
 Add a type to the rel attribute.
 
static removeTypeOf (Element $node, string $type)
 Remove a type from the typeof attribute.
 
static removeRel (Element $node, string $rel)
 Remove a type from the rel attribute.
 
static isFosterablePosition (?Node $n)
 Check whether node is in a fosterable position.
 
static isHeading (?Node $n)
 Check whether node is a heading.
 
static isList (?Node $n)
 Check whether node is a list.
 
static isListItem (?Node $n)
 Check whether node is a list item.
 
static isListOrListItem (?Node $n)
 Check whether node is a list or list item.
 
static isNestedInListItem (?Node $n)
 Check whether node is nestee in a list item.
 
static isNestedListOrListItem (?Node $n)
 Check whether node is a nested list or a list item.
 
static isMarkerMeta (Node $n, string $type)
 Check a node to see whether it's a meta with some typeof.
 
static hasElementChild (Node $node)
 Check whether a node has any children that are elements.
 
static hasBlockElementDescendant (Node $node)
 Check if a node has a block-level element descendant.
 
static isIEW (?Node $node)
 Is a node representing inter-element whitespace?
 
static isDocumentFragment (?Node $node)
 Is a node a document fragment?
 
static atTheTop (?Node $node)
 Is a node at the top?
 
static allChildrenAreTextOrComments (Node $node)
 Are all children of this node text or comment nodes?
 
static treeHasElement (Node $node, string $tagName, bool $checkRoot=false)
 Check if the dom-subtree rooted at node has an element with tag name 'tagName' By default, the root node is not checked.
 
static isTableTag (Node $node)
 Is node a table tag (table, tbody, td, tr, etc.)?
 
static selectMediaElt (Element $node)
 Returns a media element nested in node
 
static findHttpEquivHeaders (Document $doc)
 Extract http-equiv headers from the HTML, including content-language and vary headers, if present.
 
static addHttpEquivHeaders (Document $doc, array $headers)
 Add or replace http-equiv headers in the HTML <head>.
 
static extractInlinedContentVersion (Document $doc)
 
static addAttributes (Element $elt, array $attrs)
 Add attributes to a node element.
 
static appendToHead (Document $document, string $tagName, array $attrs=[])
 Create an element in the document head with the given attrs.
 
static getFragmentInnerHTML (DocumentFragment $frag)
 innerHTML and outerHTML are not defined on DocumentFragment.
 
static setFragmentInnerHTML (DocumentFragment $frag, string $html)
 innerHTML and outerHTML are not defined on DocumentFragment.
 
static parseHTMLToFragment (Document $doc, string $html)
 
static isRawTextElement (Node $node)
 
static hasBlockTag (Node $n)
 Is 'n' a block tag, or does the subtree rooted at 'n' have a block tag in it?
 
static attributes (Element $element)
 Get an associative array of attributes, suitable for serialization.
 
static isMetaDataTag (Element $node)
 
static stripPWrapper (string $ret)
 Strip a paragraph wrapper, if any, before parsing HTML to DOM.
 

Detailed Description

DOM utilities for querying the DOM.

This is largely independent of Parsoid although some Parsoid details (TokenUtils, inline content version) have snuck in.

Member Function Documentation

◆ addAttributes()

static Wikimedia\Parsoid\Utils\DOMUtils::addAttributes ( Element $elt,
array $attrs )
static

Add attributes to a node element.

Parameters
Element$eltelement
array$attrsattributes

◆ addHttpEquivHeaders()

static Wikimedia\Parsoid\Utils\DOMUtils::addHttpEquivHeaders ( Document $doc,
array $headers )
static

Add or replace http-equiv headers in the HTML <head>.

This is used for content-language and vary headers, among possible others.

Parameters
Document$docThe HTML document to update
array<string,string|string[]>$headers An array mapping HTTP header names (which are case-insensitive) to new values. If an array of values is provided, they will be joined with commas.

◆ addRel()

static Wikimedia\Parsoid\Utils\DOMUtils::addRel ( Element $node,
string $rel )
static

Add a type to the rel attribute.

This method should almost always be used instead of setAttribute, to ensure we don't overwrite existing rel information.

Parameters
Element$nodenode
string$reltype

◆ addTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::addTypeOf ( Element $node,
string $type,
bool $prepend = false )
static

Add a type to the typeof attribute.

This method should almost always be used instead of setAttribute, to ensure we don't overwrite existing typeof information.

Parameters
Element$nodenode
string$typetype
bool$prependIf true, adds value to start, rather than end. Use of this option in new code is discouraged.

◆ allChildrenAreTextOrComments()

static Wikimedia\Parsoid\Utils\DOMUtils::allChildrenAreTextOrComments ( Node $node)
static

Are all children of this node text or comment nodes?

Parameters
Node$node
Returns
bool

◆ appendToHead()

static Wikimedia\Parsoid\Utils\DOMUtils::appendToHead ( Document $document,
string $tagName,
array $attrs = [] )
static

Create an element in the document head with the given attrs.

Creates the head element in the document if needed.

Parameters
Document$document
string$tagName
array$attrs
Returns
Element The newly-appended Element

◆ assertElt()

static Wikimedia\Parsoid\Utils\DOMUtils::assertElt ( ?Node $node)
static

Assert that this is a DOM element node.

This is primarily to help phan analyze variable types. @phan-assert Element $node

Parameters
?Node$node
Returns
bool Always returns true @phan-assert Element $node

◆ atTheTop()

static Wikimedia\Parsoid\Utils\DOMUtils::atTheTop ( ?Node $node)
static

Is a node at the top?

Parameters
?Node$node
Returns
bool

◆ attributes()

static Wikimedia\Parsoid\Utils\DOMUtils::attributes ( Element $element)
static

Get an associative array of attributes, suitable for serialization.

Add the xmlns attribute if available, to workaround PHP's surprising behavior with the xmlns attribute: HTML is not an XML document, but various parts of PHP (including our misnamed XMLSerializer) pretend that it is, sort of.

Parameters
Element$element
Returns
array<string,string>
See also
https://phabricator.wikimedia.org/T235295

◆ extractInlinedContentVersion()

static Wikimedia\Parsoid\Utils\DOMUtils::extractInlinedContentVersion ( Document $doc)
static
Parameters
Document$doc
Returns
string|null

◆ findAncestorOfName()

static Wikimedia\Parsoid\Utils\DOMUtils::findAncestorOfName ( Node $node,
string $name )
static

Find an ancestor of $node with nodeName $name.

Parameters
Node$node
string$name
Returns
?Element

◆ findHttpEquivHeaders()

static Wikimedia\Parsoid\Utils\DOMUtils::findHttpEquivHeaders ( Document $doc)
static

Extract http-equiv headers from the HTML, including content-language and vary headers, if present.

Parameters
Document$doc
Returns
array<string,string>

◆ getFragmentInnerHTML()

static Wikimedia\Parsoid\Utils\DOMUtils::getFragmentInnerHTML ( DocumentFragment $frag)
static

innerHTML and outerHTML are not defined on DocumentFragment.

Defined similarly to DOMCompat::getInnerHTML()

Parameters
DocumentFragment$frag
Returns
string

◆ hasBlockElementDescendant()

static Wikimedia\Parsoid\Utils\DOMUtils::hasBlockElementDescendant ( Node $node)
static

Check if a node has a block-level element descendant.

Parameters
Node$node
Returns
bool

◆ hasBlockTag()

static Wikimedia\Parsoid\Utils\DOMUtils::hasBlockTag ( Node $n)
static

Is 'n' a block tag, or does the subtree rooted at 'n' have a block tag in it?

Parameters
Node$n
Returns
bool

◆ hasClass()

static Wikimedia\Parsoid\Utils\DOMUtils::hasClass ( Element $element,
string $regex )
static
Parameters
Element$element
string$regexPartial regular expression, e.g. "foo|bar"
Returns
bool

◆ hasElementChild()

static Wikimedia\Parsoid\Utils\DOMUtils::hasElementChild ( Node $node)
static

Check whether a node has any children that are elements.

Parameters
Node$node
Returns
bool

◆ hasNameAndTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::hasNameAndTypeOf ( Node $n,
string $name,
string $type )
static

Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string.

Parameters
Node$n
string$namenode name to test for
string$typeExpected value of "typeof" attribute (literal string)
Returns
bool True if the node matches.

◆ hasNameOrHasAncestorOfName()

static Wikimedia\Parsoid\Utils\DOMUtils::hasNameOrHasAncestorOfName ( Node $node,
string $name )
static

Check whether $node has $name or has an ancestor named $name.

Parameters
Node$node
string$name
Returns
bool

◆ hasRel()

static Wikimedia\Parsoid\Utils\DOMUtils::hasRel ( Node $n,
string $rel )
static

Determine whether the node matches the given rel attribute value.

Parameters
Node$n
string$relExpected value of "rel" attribute, as a literal string.
Returns
bool True if the node matches.

◆ hasTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::hasTypeOf ( Node $n,
string $type )
static

Determine whether the node matches the given typeof attribute value.

Parameters
Node$n
string$typeExpected value of "typeof" attribute, as a literal string.
Returns
bool True if the node matches.

◆ inSiblingOrder()

static Wikimedia\Parsoid\Utils\DOMUtils::inSiblingOrder ( Node $n1,
Node $n2 )
static

Check whether a node n1 comes before another node n2 in their parent's children list.

Parameters
Node$n1The node you expect to come first.
Node$n2Expected later sibling.
Returns
bool

◆ isAncestorOf()

static Wikimedia\Parsoid\Utils\DOMUtils::isAncestorOf ( Node $n1,
Node $n2 )
static

Check that a node 'n1' is an ancestor of another node 'n2' in the DOM.

Returns true if n1 === n2. $n1 is the suspected ancestor. $n2 The suspected descendant.

Parameters
Node$n1
Node$n2
Returns
bool

◆ isBody()

static Wikimedia\Parsoid\Utils\DOMUtils::isBody ( ?Node $node)
static

Determine whether this is the <body> DOM element.

Parameters
?Node$node
Returns
bool

◆ isDocumentFragment()

static Wikimedia\Parsoid\Utils\DOMUtils::isDocumentFragment ( ?Node $node)
static

Is a node a document fragment?

Parameters
?Node$node
Returns
bool

◆ isFormattingElt()

static Wikimedia\Parsoid\Utils\DOMUtils::isFormattingElt ( ?Node $node)
static

Determine whether this is a formatting DOM element.

Parameters
?Node$node
Returns
bool

◆ isFosterablePosition()

static Wikimedia\Parsoid\Utils\DOMUtils::isFosterablePosition ( ?Node $n)
static

Check whether node is in a fosterable position.

Parameters
?Node$n
Returns
bool

◆ isHeading()

static Wikimedia\Parsoid\Utils\DOMUtils::isHeading ( ?Node $n)
static

Check whether node is a heading.

Parameters
?Node$n
Returns
bool

◆ isIEW()

static Wikimedia\Parsoid\Utils\DOMUtils::isIEW ( ?Node $node)
static

Is a node representing inter-element whitespace?

Parameters
?Node$node
Returns
bool

◆ isList()

static Wikimedia\Parsoid\Utils\DOMUtils::isList ( ?Node $n)
static

Check whether node is a list.

Parameters
?Node$n
Returns
bool

◆ isListItem()

static Wikimedia\Parsoid\Utils\DOMUtils::isListItem ( ?Node $n)
static

Check whether node is a list item.

Parameters
?Node$n
Returns
bool

◆ isListOrListItem()

static Wikimedia\Parsoid\Utils\DOMUtils::isListOrListItem ( ?Node $n)
static

Check whether node is a list or list item.

Parameters
?Node$n
Returns
bool

◆ isMarkerMeta()

static Wikimedia\Parsoid\Utils\DOMUtils::isMarkerMeta ( Node $n,
string $type )
static

Check a node to see whether it's a meta with some typeof.

Parameters
Node$n
string$type
Returns
bool

◆ isMetaDataTag()

static Wikimedia\Parsoid\Utils\DOMUtils::isMetaDataTag ( Element $node)
static
Parameters
Element$node
Returns
bool

◆ isNestedInListItem()

static Wikimedia\Parsoid\Utils\DOMUtils::isNestedInListItem ( ?Node $n)
static

Check whether node is nestee in a list item.

Parameters
?Node$n
Returns
bool

◆ isNestedListOrListItem()

static Wikimedia\Parsoid\Utils\DOMUtils::isNestedListOrListItem ( ?Node $n)
static

Check whether node is a nested list or a list item.

Parameters
?Node$n
Returns
bool

◆ isQuoteElt()

static Wikimedia\Parsoid\Utils\DOMUtils::isQuoteElt ( ?Node $node)
static

Determine whether this is a quote DOM element.

Parameters
?Node$node
Returns
bool

◆ isRawTextElement()

static Wikimedia\Parsoid\Utils\DOMUtils::isRawTextElement ( Node $node)
static
Parameters
Node$node
Returns
bool

◆ isRemexBlockNode()

static Wikimedia\Parsoid\Utils\DOMUtils::isRemexBlockNode ( ?Node $node)
static
Parameters
?Node$node
Returns
bool

◆ isRemoved()

static Wikimedia\Parsoid\Utils\DOMUtils::isRemoved ( ?Node $node)
static

Determine whether this is a removed DOM node but Node object yet.

Parameters
?Node$node
Returns
bool

◆ isTableTag()

static Wikimedia\Parsoid\Utils\DOMUtils::isTableTag ( Node $node)
static

Is node a table tag (table, tbody, td, tr, etc.)?

Parameters
Node$node
Returns
bool

◆ isWikitextBlockNode()

static Wikimedia\Parsoid\Utils\DOMUtils::isWikitextBlockNode ( ?Node $node)
static
Parameters
?Node$node
Returns
bool

◆ matchNameAndTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::matchNameAndTypeOf ( Node $n,
string $name,
string $typeRe )
static

Determine whether the node matches the given nodeName and attribute value.

Returns true if node name matches and the attribute equals "typeof"

Parameters
Node$nThe node to test
string$nameThe expected nodeName of $n
string$typeReRegular expression matching the expected value of typeof attribute.
Returns
?string The matching typeof value, or null if there is no match.

◆ matchRel()

static Wikimedia\Parsoid\Utils\DOMUtils::matchRel ( Node $n,
string $relRe )
static

Determine whether the node matches the given rel attribute value.

Parameters
Node$nThe node to test
string$relReRegular expression matching the expected value of the rel attribute.
Returns
?string The matching rel value, or null if there is no match.

◆ matchTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::matchTypeOf ( Node $n,
string $typeRe )
static

Determine whether the node matches the given typeof attribute value.

Parameters
Node$nThe node to test
string$typeReRegular expression matching the expected value of the typeof attribute.
Returns
?string The matching typeof value, or null if there is no match.

◆ migrateChildren()

static Wikimedia\Parsoid\Utils\DOMUtils::migrateChildren ( Node $from,
Node $to,
?Node $beforeNode = null )
static

Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end.

Parameters
Node$fromSource node. Children will be removed.
Node$toDestination node. Children of $from will be added here
?Node$beforeNodeAdd the children before this node.

◆ migrateChildrenBetweenDocs()

static Wikimedia\Parsoid\Utils\DOMUtils::migrateChildrenBetweenDocs ( Node $from,
Node $to,
?Node $beforeNode = null )
static

Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents.

If 'beforeNode' is null, the nodes are appended at the end.

Parameters
Node$from
Node$to
?Node$beforeNode

◆ nodeDepth()

static Wikimedia\Parsoid\Utils\DOMUtils::nodeDepth ( Node $node)
static

Compute the edge length of the path from $node to the root.

Root document is at depth 0, <html> at 1, <body> at 2.

Parameters
Node$node
Returns
int

◆ parseHTML()

static Wikimedia\Parsoid\Utils\DOMUtils::parseHTML ( string $html,
bool $validateXMLNames = false )
static

Parse HTML, return the tree.

Parameters
string$html
bool$validateXMLNames
Returns
Document

◆ parseHTMLToFragment()

static Wikimedia\Parsoid\Utils\DOMUtils::parseHTMLToFragment ( Document $doc,
string $html )
static
Parameters
Document$doc
string$html
Returns
DocumentFragment

◆ pathToRoot()

static Wikimedia\Parsoid\Utils\DOMUtils::pathToRoot ( Node $node)
static

Build path from a node to the root of the document.

Parameters
Node$node
Returns
Node[] Path including all nodes from $node to the root of the document

◆ pathToSibling()

static Wikimedia\Parsoid\Utils\DOMUtils::pathToSibling ( Node $node,
Node $sibling,
bool $left )
static

Build path from a node to its passed-in sibling.

Return will not include the passed-in sibling.

Parameters
Node$node
Node$sibling
bool$leftindicates whether to go backwards, use previousSibling instead of nextSibling.
Returns
Node[]

◆ removeRel()

static Wikimedia\Parsoid\Utils\DOMUtils::removeRel ( Element $node,
string $rel )
static

Remove a type from the rel attribute.

Parameters
Element$nodenode
string$relrel

◆ removeTypeOf()

static Wikimedia\Parsoid\Utils\DOMUtils::removeTypeOf ( Element $node,
string $type )
static

Remove a type from the typeof attribute.

Parameters
Element$nodenode
string$typetype

◆ selectMediaElt()

static Wikimedia\Parsoid\Utils\DOMUtils::selectMediaElt ( Element $node)
static

Returns a media element nested in node

Parameters
Element$node
Returns
Element|null

◆ setFragmentInnerHTML()

static Wikimedia\Parsoid\Utils\DOMUtils::setFragmentInnerHTML ( DocumentFragment $frag,
string $html )
static

innerHTML and outerHTML are not defined on DocumentFragment.

See also
DOMCompat::setInnerHTML() for the Element version
Parameters
DocumentFragment$frag
string$html

◆ treeHasElement()

static Wikimedia\Parsoid\Utils\DOMUtils::treeHasElement ( Node $node,
string $tagName,
bool $checkRoot = false )
static

Check if the dom-subtree rooted at node has an element with tag name 'tagName' By default, the root node is not checked.

Parameters
Node$nodeThe DOM node whose tree should be checked
string$tagNameTag name to look for
bool$checkRootShould the root be checked?
Returns
bool

◆ visitDOM()

static Wikimedia\Parsoid\Utils\DOMUtils::visitDOM ( Node $node,
callable $handler,
$args )
static

This is a simplified version of the DOMTraverser.

Consider using that before making this more complex.

FIXME: Move to DOMTraverser OR create a new class?

Parameters
Node$node
callable$handler
mixed...$args

The documentation for this class was generated from the following file: