Parsoid
A bidirectional parser between wikitext and HTML5
Parsoid\Utils\DOMUtils Class Reference

DOM utilities for querying the DOM. More...

Static Public Member Functions

static parseHTML (string $html)
 Parse HTML, return the tree. More...
 
static visitDOM (DOMNode $node, callable $handler,... $args)
 This is a simplified version of the DOMTraverser. More...
 
static migrateChildren (DOMNode $from, DOMNode $to, DOMNode $beforeNode=null)
 Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end. More...
 
static migrateChildrenBetweenDocs (DOMNode $from, DOMNode $to, DOMNode $beforeNode=null)
 Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents. More...
 
static isElt (?DOMNode $node)
 Check whether this is a DOM element node. More...
 
static assertElt (?DOMNode $node)
 Assert that this is a DOM element node. More...
 
static isText (?DOMNode $node)
 Check whether this is a DOM text node. More...
 
static isComment (?DOMNode $node)
 Check whether this is a DOM comment node. More...
 
static isBlockNode (?DOMNode $node)
 Determine whether this is a block-level DOM element. More...
 
static isFormattingElt (?DOMNode $node)
 Determine whether this is a formatting DOM element. More...
 
static isQuoteElt (?DOMNode $node)
 Determine whether this is a quote DOM element. More...
 
static isBody (?DOMNode $node)
 Determine whether this is a body DOM element. More...
 
static isRemoved (?DOMNode $node)
 Determine whether this is a removed DOM node but DOMNode object yet. More...
 
static hasNChildren (DOMNode $node, int $nchildren, bool $countDiffMarkers=false)
 PORT-FIXME: Is this necessary with PHP DOM unlike Domino in JS? More...
 
static pathToAncestor (DOMNode $node, DOMNode $ancestor=null)
 Build path from a node to its passed-in ancestor. More...
 
static pathToRoot (DOMNode $node)
 Build path from a node to the root of the document. More...
 
static pathToSibling (DOMNode $node, DOMNode $sibling, bool $left)
 Build path from a node to its passed-in sibling. More...
 
static inSiblingOrder (DOMNode $n1, DOMNode $n2)
 Check whether a node n1 comes before another node n2 in their parent's children list. More...
 
static isAncestorOf (DOMNode $n1, DOMNode $n2)
 Check that a node 'n1' is an ancestor of another node 'n2' in the DOM. More...
 
static hasAncestorOfName (DOMNode $node, string $name)
 Check whether node has an ancestor named name. More...
 
static matchNameAndTypeOf (DOMNode $n, string $name, string $typeRe)
 Determine whether the node matches the given nodeName and attribute value. More...
 
static hasNameAndTypeOf (DOMNode $n, string $name, string $type)
 Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string. More...
 
static matchTypeOf (DOMNode $n, string $typeRe)
 Determine whether the node matches the given typeof attribute value. More...
 
static hasTypeOf (DOMNode $n, string $type)
 Determine whether the node matches the given typeof attribute value. More...
 
static isFosterablePosition (?DOMNode $n)
 Check whether node is in a fosterable position. More...
 
static isList (?DOMNode $n)
 Check whether node is a list. More...
 
static isListItem (?DOMNode $n)
 Check whether node is a list item. More...
 
static isListOrListItem (?DOMNode $n)
 Check whether node is a list or list item. More...
 
static isNestedInListItem (?DOMNode $n)
 Check whether node is nestee in a list item. More...
 
static isNestedListOrListItem (?DOMNode $n)
 Check whether node is a nested list or a list item. More...
 
static isMarkerMeta (DOMNode $n, string $type)
 Check a node to see whether it's a meta with some typeof. More...
 
static isDiffMarker (?DOMNode $node, string $mark=null)
 Check a node to see whether it's a diff marker. More...
 
static hasElementChild (DOMNode $node)
 Check whether a node has any children that are elements. More...
 
static hasBlockElementDescendant (DOMNode $node)
 Check if a node has a block-level element descendant. More...
 
static isIEW (?DOMNode $node)
 Is a node representing inter-element whitespace? More...
 
static isDocumentFragment (?DOMNode $node)
 Is a node a document fragment? More...
 
static atTheTop (?DOMNode $node)
 Is a node at the top? More...
 
static isContentNode (?DOMNode $node)
 Is a node a content node? More...
 
static firstNonSepChild (DOMNode $node)
 Get the first child element or non-IEW text node, ignoring whitespace-only text nodes, comments, and deleted nodes. More...
 
static lastNonSepChild (DOMNode $node)
 Get the last child element or non-IEW text node, ignoring whitespace-only text nodes, comments, and deleted nodes. More...
 
static previousNonSepSibling (DOMNode $node)
 Get the previous non seperator sibling node. More...
 
static nextNonSepSibling (DOMNode $node)
 Get the next non seperator sibling node. More...
 
static numNonDeletedChildNodes (DOMNode $node)
 Return the numbler of non deleted child nodes. More...
 
static firstNonDeletedChild (DOMNode $node)
 Get the first non-deleted child of node. More...
 
static lastNonDeletedChild (DOMNode $node)
 Get the last non-deleted child of node. More...
 
static nextNonDeletedSibling (DOMNode $node)
 Get the next non deleted sibling. More...
 
static previousNonDeletedSibling (DOMNode $node)
 Get the previous non deleted sibling. More...
 
static allChildrenAreTextOrComments (DOMNode $node)
 Are all children of this node text or comment nodes? More...
 
static allChildrenAreText (DOMNode $node)
 Are all children of this node text nodes? More...
 
static nodeEssentiallyEmpty (DOMNode $node, bool $strict=false)
 Does node contain nothing or just non-newline whitespace? strict adds the condition that all whitespace is forbidden. More...
 
static treeHasElement (DOMNode $node, string $tagName)
 Check if the dom-subtree rooted at node has an element with tag name 'tagName' The root node is not checked. More...
 
static isTableTag (DOMNode $node)
 Is node a table tag (table, tbody, td, tr, etc.)? More...
 
static selectMediaElt (DOMElement $node)
 Returns a media element nested in node More...
 
static findHttpEquivHeaders (DOMDocument $doc)
 Extract http-equiv headers from the HTML, including content-language and vary headers, if present. More...
 
static extractInlinedContentVersion (DOMDocument $doc)
 

Public Attributes

const TPL_META_TYPE_REGEXP = '/(?:^|\s)(mw:(?:Transclusion|Param)(?:\/End)?)(?=$|\s)/'
 
const FIRST_ENCAP_REGEXP
 

Detailed Description

DOM utilities for querying the DOM.

This is largely independent of Parsoid although some Parsoid details (diff markers, TokenUtils, inline content version) have snuck in.

Member Function Documentation

◆ allChildrenAreText()

static Parsoid\Utils\DOMUtils::allChildrenAreText ( DOMNode  $node)
static

Are all children of this node text nodes?

Parameters
DOMNode$node
Returns
bool

◆ allChildrenAreTextOrComments()

static Parsoid\Utils\DOMUtils::allChildrenAreTextOrComments ( DOMNode  $node)
static

Are all children of this node text or comment nodes?

Parameters
DOMNode$node
Returns
bool

◆ assertElt()

static Parsoid\Utils\DOMUtils::assertElt ( ?DOMNode  $node)
static

Assert that this is a DOM element node.

This is primarily to help phan analyze variable types. -assert DOMElement $node

Parameters
DOMNode | null$node
Returns
bool Always returns true

◆ atTheTop()

static Parsoid\Utils\DOMUtils::atTheTop ( ?DOMNode  $node)
static

Is a node at the top?

Parameters
DOMNode | null$node
Returns
bool

◆ extractInlinedContentVersion()

static Parsoid\Utils\DOMUtils::extractInlinedContentVersion ( DOMDocument  $doc)
static
Parameters
DOMDocument$doc
Returns
string|null

◆ findHttpEquivHeaders()

static Parsoid\Utils\DOMUtils::findHttpEquivHeaders ( DOMDocument  $doc)
static

Extract http-equiv headers from the HTML, including content-language and vary headers, if present.

Parameters
DOMDocument$doc
Returns
DOMNode[]

◆ firstNonDeletedChild()

static Parsoid\Utils\DOMUtils::firstNonDeletedChild ( DOMNode  $node)
static

Get the first non-deleted child of node.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ firstNonSepChild()

static Parsoid\Utils\DOMUtils::firstNonSepChild ( DOMNode  $node)
static

Get the first child element or non-IEW text node, ignoring whitespace-only text nodes, comments, and deleted nodes.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ hasAncestorOfName()

static Parsoid\Utils\DOMUtils::hasAncestorOfName ( DOMNode  $node,
string  $name 
)
static

Check whether node has an ancestor named name.

Parameters
DOMNode$node
string$name
Returns
bool

◆ hasBlockElementDescendant()

static Parsoid\Utils\DOMUtils::hasBlockElementDescendant ( DOMNode  $node)
static

Check if a node has a block-level element descendant.

Parameters
DOMNode$node
Returns
bool

◆ hasElementChild()

static Parsoid\Utils\DOMUtils::hasElementChild ( DOMNode  $node)
static

Check whether a node has any children that are elements.

Parameters
DOMNode$node
Returns
bool

◆ hasNameAndTypeOf()

static Parsoid\Utils\DOMUtils::hasNameAndTypeOf ( DOMNode  $n,
string  $name,
string  $type 
)
static

Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string.

Parameters
DOMNode$n
string$namenode name to test for
string$typeExpected value of "typeof" attribute (literal string)
Returns
bool True if the node matches.

◆ hasNChildren()

static Parsoid\Utils\DOMUtils::hasNChildren ( DOMNode  $node,
int  $nchildren,
bool  $countDiffMarkers = false 
)
static

PORT-FIXME: Is this necessary with PHP DOM unlike Domino in JS?

Test the number of children this node has without using Node::childNodes.length. This walks the sibling list and so takes O(nchildren) time – so nchildren is expected to be small (say: 0, 1, or 2).

Skips all diff markers by default.

Parameters
DOMNode$node
int$nchildren
bool$countDiffMarkers
Returns
bool

◆ hasTypeOf()

static Parsoid\Utils\DOMUtils::hasTypeOf ( DOMNode  $n,
string  $type 
)
static

Determine whether the node matches the given typeof attribute value.

Parameters
DOMNode$n
string$typeExpected value of "typeof" attribute, as a literal string.
Returns
bool True if the node matches.

◆ inSiblingOrder()

static Parsoid\Utils\DOMUtils::inSiblingOrder ( DOMNode  $n1,
DOMNode  $n2 
)
static

Check whether a node n1 comes before another node n2 in their parent's children list.

Parameters
DOMNode$n1The node you expect to come first.
DOMNode$n2Expected later sibling.
Returns
bool

◆ isAncestorOf()

static Parsoid\Utils\DOMUtils::isAncestorOf ( DOMNode  $n1,
DOMNode  $n2 
)
static

Check that a node 'n1' is an ancestor of another node 'n2' in the DOM.

Returns true if n1 === n2. $n1 is the suspected ancestor. $n2 The suspected descendant.

Parameters
DOMNode$n1
DOMNode$n2
Returns
bool

◆ isBlockNode()

static Parsoid\Utils\DOMUtils::isBlockNode ( ?DOMNode  $node)
static

Determine whether this is a block-level DOM element.

Parameters
DOMNode | null$node
Returns
bool

◆ isBody()

static Parsoid\Utils\DOMUtils::isBody ( ?DOMNode  $node)
static

Determine whether this is a body DOM element.

Parameters
DOMNode | null$node
Returns
bool

◆ isComment()

static Parsoid\Utils\DOMUtils::isComment ( ?DOMNode  $node)
static

Check whether this is a DOM comment node.

See also
http://dom.spec.whatwg.org/#dom-node-nodetype
Parameters
DOMNode | null$node
Returns
bool

◆ isContentNode()

static Parsoid\Utils\DOMUtils::isContentNode ( ?DOMNode  $node)
static

Is a node a content node?

Parameters
DOMNode | null$node
Returns
bool

◆ isDiffMarker()

static Parsoid\Utils\DOMUtils::isDiffMarker ( ?DOMNode  $node,
string  $mark = null 
)
static

Check a node to see whether it's a diff marker.

Parameters
?DOMNode$node
string | null$mark
Returns
bool

◆ isDocumentFragment()

static Parsoid\Utils\DOMUtils::isDocumentFragment ( ?DOMNode  $node)
static

Is a node a document fragment?

Parameters
DOMNode | null$node
Returns
bool

◆ isElt()

static Parsoid\Utils\DOMUtils::isElt ( ?DOMNode  $node)
static

Check whether this is a DOM element node.

See also
http://dom.spec.whatwg.org/#dom-node-nodetype
Parameters
DOMNode | null$node
Returns
bool

◆ isFormattingElt()

static Parsoid\Utils\DOMUtils::isFormattingElt ( ?DOMNode  $node)
static

Determine whether this is a formatting DOM element.

Parameters
DOMNode | null$node
Returns
bool

◆ isFosterablePosition()

static Parsoid\Utils\DOMUtils::isFosterablePosition ( ?DOMNode  $n)
static

Check whether node is in a fosterable position.

Parameters
DOMNode | null$n
Returns
bool

◆ isIEW()

static Parsoid\Utils\DOMUtils::isIEW ( ?DOMNode  $node)
static

Is a node representing inter-element whitespace?

Parameters
DOMNode | null$node
Returns
bool

◆ isList()

static Parsoid\Utils\DOMUtils::isList ( ?DOMNode  $n)
static

Check whether node is a list.

Parameters
DOMNode | null$n
Returns
bool

◆ isListItem()

static Parsoid\Utils\DOMUtils::isListItem ( ?DOMNode  $n)
static

Check whether node is a list item.

Parameters
DOMNode | null$n
Returns
bool

◆ isListOrListItem()

static Parsoid\Utils\DOMUtils::isListOrListItem ( ?DOMNode  $n)
static

Check whether node is a list or list item.

Parameters
DOMNode | null$n
Returns
bool

◆ isMarkerMeta()

static Parsoid\Utils\DOMUtils::isMarkerMeta ( DOMNode  $n,
string  $type 
)
static

Check a node to see whether it's a meta with some typeof.

Parameters
DOMNode$n
string$type
Returns
bool

◆ isNestedInListItem()

static Parsoid\Utils\DOMUtils::isNestedInListItem ( ?DOMNode  $n)
static

Check whether node is nestee in a list item.

Parameters
DOMNode | null$n
Returns
bool

◆ isNestedListOrListItem()

static Parsoid\Utils\DOMUtils::isNestedListOrListItem ( ?DOMNode  $n)
static

Check whether node is a nested list or a list item.

Parameters
DOMNode | null$n
Returns
bool

◆ isQuoteElt()

static Parsoid\Utils\DOMUtils::isQuoteElt ( ?DOMNode  $node)
static

Determine whether this is a quote DOM element.

Parameters
DOMNode | null$node
Returns
bool

◆ isRemoved()

static Parsoid\Utils\DOMUtils::isRemoved ( ?DOMNode  $node)
static

Determine whether this is a removed DOM node but DOMNode object yet.

Parameters
DOMNode | null$node
Returns
bool

◆ isTableTag()

static Parsoid\Utils\DOMUtils::isTableTag ( DOMNode  $node)
static

Is node a table tag (table, tbody, td, tr, etc.)?

Parameters
DOMNode$node
Returns
bool

◆ isText()

static Parsoid\Utils\DOMUtils::isText ( ?DOMNode  $node)
static

Check whether this is a DOM text node.

See also
http://dom.spec.whatwg.org/#dom-node-nodetype
Parameters
DOMNode | null$node
Returns
bool

◆ lastNonDeletedChild()

static Parsoid\Utils\DOMUtils::lastNonDeletedChild ( DOMNode  $node)
static

Get the last non-deleted child of node.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ lastNonSepChild()

static Parsoid\Utils\DOMUtils::lastNonSepChild ( DOMNode  $node)
static

Get the last child element or non-IEW text node, ignoring whitespace-only text nodes, comments, and deleted nodes.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ matchNameAndTypeOf()

static Parsoid\Utils\DOMUtils::matchNameAndTypeOf ( DOMNode  $n,
string  $name,
string  $typeRe 
)
static

Determine whether the node matches the given nodeName and attribute value.

Returns true if node name matches and the attribute equals "typeof"

Parameters
DOMNode$nThe node to test
string$nameThe expected nodeName of $n
string$typeReRegular expression matching the expected value of typeof attribute.
Returns
?string The matching typeof value, or null if there is no match.

◆ matchTypeOf()

static Parsoid\Utils\DOMUtils::matchTypeOf ( DOMNode  $n,
string  $typeRe 
)
static

Determine whether the node matches the given typeof attribute value.

Parameters
DOMNode$nThe node to test
string$typeReRegular expression matching the expected value of the typeof attribute.
Returns
?string The matching typeof value, or null if there is no match.

◆ migrateChildren()

static Parsoid\Utils\DOMUtils::migrateChildren ( DOMNode  $from,
DOMNode  $to,
DOMNode  $beforeNode = null 
)
static

Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end.

Parameters
DOMNode$fromSource node. Children will be removed.
DOMNode$toDestination node. Children of $from will be added here
DOMNode | null$beforeNodeAdd the children before this node.

◆ migrateChildrenBetweenDocs()

static Parsoid\Utils\DOMUtils::migrateChildrenBetweenDocs ( DOMNode  $from,
DOMNode  $to,
DOMNode  $beforeNode = null 
)
static

Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents.

If 'beforeNode' is null, the nodes are appended at the end.

Parameters
DOMNode$from
DOMNode$to
DOMNode | null$beforeNode

◆ nextNonDeletedSibling()

static Parsoid\Utils\DOMUtils::nextNonDeletedSibling ( DOMNode  $node)
static

Get the next non deleted sibling.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ nextNonSepSibling()

static Parsoid\Utils\DOMUtils::nextNonSepSibling ( DOMNode  $node)
static

Get the next non seperator sibling node.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ nodeEssentiallyEmpty()

static Parsoid\Utils\DOMUtils::nodeEssentiallyEmpty ( DOMNode  $node,
bool  $strict = false 
)
static

Does node contain nothing or just non-newline whitespace? strict adds the condition that all whitespace is forbidden.

Parameters
DOMNode$node
bool$strict
Returns
bool

◆ numNonDeletedChildNodes()

static Parsoid\Utils\DOMUtils::numNonDeletedChildNodes ( DOMNode  $node)
static

Return the numbler of non deleted child nodes.

Parameters
DOMNode$node
Returns
int

◆ parseHTML()

static Parsoid\Utils\DOMUtils::parseHTML ( string  $html)
static

Parse HTML, return the tree.

Parameters
string$html
Returns
DOMDocument

◆ pathToAncestor()

static Parsoid\Utils\DOMUtils::pathToAncestor ( DOMNode  $node,
DOMNode  $ancestor = null 
)
static

Build path from a node to its passed-in ancestor.

Doesn't include the ancestor in the returned path.

Parameters
DOMNode$node
DOMNode | null$ancestor$ancestor should be an ancestor of $node. If null, we'll walk to the document root.
Returns
DOMNode[]

◆ pathToRoot()

static Parsoid\Utils\DOMUtils::pathToRoot ( DOMNode  $node)
static

Build path from a node to the root of the document.

Parameters
DOMNode$node
Returns
DOMNode[]

◆ pathToSibling()

static Parsoid\Utils\DOMUtils::pathToSibling ( DOMNode  $node,
DOMNode  $sibling,
bool  $left 
)
static

Build path from a node to its passed-in sibling.

Return will not include the passed-in sibling.

Parameters
DOMNode$node
DOMNode$sibling
bool$leftindicates whether to go backwards, use previousSibling instead of nextSibling.
Returns
DOMNode[]

◆ previousNonDeletedSibling()

static Parsoid\Utils\DOMUtils::previousNonDeletedSibling ( DOMNode  $node)
static

Get the previous non deleted sibling.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ previousNonSepSibling()

static Parsoid\Utils\DOMUtils::previousNonSepSibling ( DOMNode  $node)
static

Get the previous non seperator sibling node.

Parameters
DOMNode$node
Returns
DOMNode|null

◆ selectMediaElt()

static Parsoid\Utils\DOMUtils::selectMediaElt ( DOMElement  $node)
static

Returns a media element nested in node

Parameters
DOMElement$node
Returns
DOMElement|null

◆ treeHasElement()

static Parsoid\Utils\DOMUtils::treeHasElement ( DOMNode  $node,
string  $tagName 
)
static

Check if the dom-subtree rooted at node has an element with tag name 'tagName' The root node is not checked.

Parameters
DOMNode$node
string$tagName
Returns
bool

◆ visitDOM()

static Parsoid\Utils\DOMUtils::visitDOM ( DOMNode  $node,
callable  $handler,
  $args 
)
static

This is a simplified version of the DOMTraverser.

Consider using that before making this more complex.

FIXME: Move to DOMTraverser OR create a new class?

Parameters
DOMNode$node
callable$handler
mixed...$args

Member Data Documentation

◆ FIRST_ENCAP_REGEXP

const Parsoid\Utils\DOMUtils::FIRST_ENCAP_REGEXP
Initial value:
=
'/(?:^|\s)(mw:(?:Transclusion|Param|LanguageVariant|Extension(\/[^\s]+)))(?=$|\s)/'

The documentation for this class was generated from the following file: