Parsoid
A bidirectional parser between wikitext and HTML5
|
DOM utilities for querying the DOM. More...
Static Public Member Functions | |
static | parseHTML (string $html, bool $validateXMLNames=false) |
Parse HTML, return the tree. | |
static | visitDOM (Node $node, callable $handler,... $args) |
This is a simplified version of the DOMTraverser. | |
static | migrateChildren (Node $from, Node $to, ?Node $beforeNode=null) |
Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end. | |
static | migrateChildrenBetweenDocs (Node $from, Node $to, ?Node $beforeNode=null) |
Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents. | |
static | assertElt (?Node $node) |
Assert that this is a DOM element node. | |
static | isRemexBlockNode (?Node $node) |
static | isWikitextBlockNode (?Node $node) |
static | isFormattingElt (?Node $node) |
Determine whether this is a formatting DOM element. | |
static | isQuoteElt (?Node $node) |
Determine whether this is a quote DOM element. | |
static | isBody (?Node $node) |
Determine whether this is the <body> DOM element. | |
static | isRemoved (?Node $node) |
Determine whether this is a removed DOM node but Node object yet. | |
static | pathToRoot (Node $node) |
Build path from a node to the root of the document. | |
static | nodeDepth (Node $node) |
Compute the edge length of the path from $node to the root. | |
static | pathToSibling (Node $node, Node $sibling, bool $left) |
Build path from a node to its passed-in sibling. | |
static | inSiblingOrder (Node $n1, Node $n2) |
Check whether a node n1 comes before another node n2 in their parent's children list. | |
static | isAncestorOf (Node $n1, Node $n2) |
Check that a node 'n1' is an ancestor of another node 'n2' in the DOM. | |
static | findAncestorOfName (Node $node, string $name) |
Find an ancestor of $node with nodeName $name. | |
static | hasNameOrHasAncestorOfName (Node $node, string $name) |
Check whether $node has $name or has an ancestor named $name. | |
static | matchNameAndTypeOf (Node $n, string $name, string $typeRe) |
Determine whether the node matches the given nodeName and attribute value. | |
static | hasNameAndTypeOf (Node $n, string $name, string $type) |
Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string. | |
static | matchTypeOf (Node $n, string $typeRe) |
Determine whether the node matches the given typeof attribute value. | |
static | matchRel (Node $n, string $relRe) |
Determine whether the node matches the given rel attribute value. | |
static | hasTypeOf (Node $n, string $type) |
Determine whether the node matches the given typeof attribute value. | |
static | hasRel (Node $n, string $rel) |
Determine whether the node matches the given rel attribute value. | |
static | hasClass (Element $element, string $regex) |
static | addTypeOf (Element $node, string $type, bool $prepend=false) |
Add a type to the typeof attribute. | |
static | addRel (Element $node, string $rel) |
Add a type to the rel attribute. | |
static | removeTypeOf (Element $node, string $type) |
Remove a type from the typeof attribute. | |
static | removeRel (Element $node, string $rel) |
Remove a type from the rel attribute. | |
static | isFosterablePosition (?Node $n) |
Check whether node is in a fosterable position. | |
static | isHeading (?Node $n) |
Check whether node is a heading. | |
static | isList (?Node $n) |
Check whether node is a list. | |
static | isListItem (?Node $n) |
Check whether node is a list item. | |
static | isListOrListItem (?Node $n) |
Check whether node is a list or list item. | |
static | isNestedInListItem (?Node $n) |
Check whether node is nestee in a list item. | |
static | isNestedListOrListItem (?Node $n) |
Check whether node is a nested list or a list item. | |
static | isMarkerMeta (Node $n, string $type) |
Check a node to see whether it's a meta with some typeof. | |
static | hasElementChild (Node $node) |
Check whether a node has any children that are elements. | |
static | hasBlockElementDescendant (Node $node) |
Check if a node has a block-level element descendant. | |
static | isIEW (?Node $node) |
Is a node representing inter-element whitespace? | |
static | isDocumentFragment (?Node $node) |
Is a node a document fragment? | |
static | atTheTop (?Node $node) |
Is a node at the top? | |
static | allChildrenAreTextOrComments (Node $node) |
Are all children of this node text or comment nodes? | |
static | treeHasElement (Node $node, string $tagName, bool $checkRoot=false) |
Check if the dom-subtree rooted at node has an element with tag name 'tagName' By default, the root node is not checked. | |
static | isTableTag (Node $node) |
Is node a table tag (table, tbody, td, tr, etc.)? | |
static | selectMediaElt (Element $node) |
Returns a media element nested in node | |
static | findHttpEquivHeaders (Document $doc) |
Extract http-equiv headers from the HTML, including content-language and vary headers, if present. | |
static | addHttpEquivHeaders (Document $doc, array $headers) |
Add or replace http-equiv headers in the HTML <head>. | |
static | extractInlinedContentVersion (Document $doc) |
static | addAttributes (Element $elt, array $attrs) |
Add attributes to a node element. | |
static | appendToHead (Document $document, string $tagName, array $attrs=[]) |
Create an element in the document head with the given attrs. | |
static | getFragmentInnerHTML (DocumentFragment $frag) |
innerHTML and outerHTML are not defined on DocumentFragment. | |
static | setFragmentInnerHTML (DocumentFragment $frag, string $html) |
innerHTML and outerHTML are not defined on DocumentFragment. | |
static | parseHTMLToFragment (Document $doc, string $html) |
static | isRawTextElement (Node $node) |
static | hasBlockTag (Node $n) |
Is 'n' a block tag, or does the subtree rooted at 'n' have a block tag in it? | |
static | attributes (Element $element) |
Get an associative array of attributes, suitable for serialization. | |
static | isMetaDataTag (Element $node) |
static | stripPWrapper (string $ret) |
Strip a paragraph wrapper, if any, before parsing HTML to DOM. | |
DOM utilities for querying the DOM.
This is largely independent of Parsoid although some Parsoid details (TokenUtils, inline content version) have snuck in.
|
static |
Add attributes to a node element.
Element | $elt | element |
array | $attrs | attributes |
|
static |
Add or replace http-equiv headers in the HTML <head>.
This is used for content-language and vary headers, among possible others.
Document | $doc | The HTML document to update |
array<string,string|string[]> | $headers An array mapping HTTP header names (which are case-insensitive) to new values. If an array of values is provided, they will be joined with commas. |
|
static |
Add a type to the rel attribute.
This method should almost always be used instead of setAttribute
, to ensure we don't overwrite existing rel information.
Element | $node | node |
string | $rel | type |
|
static |
Add a type to the typeof attribute.
This method should almost always be used instead of setAttribute
, to ensure we don't overwrite existing typeof information.
Element | $node | node |
string | $type | type |
bool | $prepend | If true, adds value to start, rather than end. Use of this option in new code is discouraged. |
|
static |
Are all children of this node text or comment nodes?
Node | $node |
|
static |
Create an element in the document head with the given attrs.
Creates the head element in the document if needed.
Document | $document | |
string | $tagName | |
array | $attrs |
|
static |
Assert that this is a DOM element node.
This is primarily to help phan analyze variable types. @phan-assert Element $node
?Node | $node |
|
static |
Is a node at the top?
?Node | $node |
|
static |
Get an associative array of attributes, suitable for serialization.
Add the xmlns attribute if available, to workaround PHP's surprising behavior with the xmlns attribute: HTML is not an XML document, but various parts of PHP (including our misnamed XMLSerializer) pretend that it is, sort of.
Element | $element |
|
static |
Document | $doc |
|
static |
Find an ancestor of $node with nodeName $name.
Node | $node | |
string | $name |
|
static |
Extract http-equiv headers from the HTML, including content-language and vary headers, if present.
Document | $doc |
|
static |
innerHTML and outerHTML are not defined on DocumentFragment.
Defined similarly to DOMCompat::getInnerHTML()
DocumentFragment | $frag |
|
static |
Check if a node has a block-level element descendant.
Node | $node |
|
static |
Is 'n' a block tag, or does the subtree rooted at 'n' have a block tag in it?
Node | $n |
|
static |
Element | $element | |
string | $regex | Partial regular expression, e.g. "foo|bar" |
|
static |
Check whether a node has any children that are elements.
Node | $node |
|
static |
Determine whether the node matches the given nodeName and typeof attribute value; the typeof is given as string.
Node | $n | |
string | $name | node name to test for |
string | $type | Expected value of "typeof" attribute (literal string) |
|
static |
Check whether $node has $name or has an ancestor named $name.
Node | $node | |
string | $name |
|
static |
Determine whether the node matches the given rel attribute value.
Node | $n | |
string | $rel | Expected value of "rel" attribute, as a literal string. |
|
static |
Determine whether the node matches the given typeof attribute value.
Node | $n | |
string | $type | Expected value of "typeof" attribute, as a literal string. |
Check whether a node n1
comes before another node n2
in their parent's children list.
Node | $n1 | The node you expect to come first. |
Node | $n2 | Expected later sibling. |
Check that a node 'n1' is an ancestor of another node 'n2' in the DOM.
Returns true if n1 === n2. $n1 is the suspected ancestor. $n2 The suspected descendant.
Node | $n1 | |
Node | $n2 |
|
static |
Determine whether this is the <body> DOM element.
?Node | $node |
|
static |
Is a node a document fragment?
?Node | $node |
|
static |
Determine whether this is a formatting DOM element.
?Node | $node |
|
static |
Check whether node
is in a fosterable position.
?Node | $n |
|
static |
Check whether node
is a heading.
?Node | $n |
|
static |
Is a node representing inter-element whitespace?
?Node | $node |
|
static |
Check whether node
is a list.
?Node | $n |
|
static |
Check whether node
is a list item.
?Node | $n |
|
static |
Check whether node
is a list or list item.
?Node | $n |
|
static |
Check a node to see whether it's a meta with some typeof.
Node | $n | |
string | $type |
|
static |
Element | $node |
|
static |
Check whether node
is nestee in a list item.
?Node | $n |
|
static |
Check whether node
is a nested list or a list item.
?Node | $n |
|
static |
Determine whether this is a quote DOM element.
?Node | $node |
|
static |
Node | $node |
|
static |
?Node | $node |
|
static |
Determine whether this is a removed DOM node but Node object yet.
?Node | $node |
|
static |
Is node a table tag (table, tbody, td, tr, etc.)?
Node | $node |
|
static |
?Node | $node |
|
static |
Determine whether the node matches the given nodeName and attribute value.
Returns true if node name matches and the attribute equals "typeof"
Node | $n | The node to test |
string | $name | The expected nodeName of $n |
string | $typeRe | Regular expression matching the expected value of typeof attribute. |
typeof
value, or null
if there is no match.
|
static |
Determine whether the node matches the given rel
attribute value.
Node | $n | The node to test |
string | $relRe | Regular expression matching the expected value of the rel attribute. |
rel
value, or null
if there is no match.
|
static |
Determine whether the node matches the given typeof
attribute value.
Node | $n | The node to test |
string | $typeRe | Regular expression matching the expected value of the typeof attribute. |
typeof
value, or null
if there is no match.
|
static |
Move 'from'.childNodes to 'to' adding them before 'beforeNode' If 'beforeNode' is null, the nodes are appended at the end.
Node | $from | Source node. Children will be removed. |
Node | $to | Destination node. Children of $from will be added here |
?Node | $beforeNode | Add the children before this node. |
|
static |
Copy 'from'.childNodes to 'to' adding them before 'beforeNode' 'from' and 'to' belong to different documents.
If 'beforeNode' is null, the nodes are appended at the end.
Node | $from | |
Node | $to | |
?Node | $beforeNode |
|
static |
Compute the edge length of the path from $node to the root.
Root document is at depth 0, <html> at 1, <body> at 2.
Node | $node |
|
static |
Parse HTML, return the tree.
string | $html | |
bool | $validateXMLNames |
|
static |
Document | $doc | |
string | $html |
|
static |
Build path from a node to the root of the document.
Node | $node |
|
static |
Build path from a node to its passed-in sibling.
Return will not include the passed-in sibling.
Node | $node | |
Node | $sibling | |
bool | $left | indicates whether to go backwards, use previousSibling instead of nextSibling. |
|
static |
Remove a type from the rel attribute.
Element | $node | node |
string | $rel | rel |
|
static |
Remove a type from the typeof attribute.
Element | $node | node |
string | $type | type |
|
static |
Returns a media element nested in node
Element | $node |
|
static |
innerHTML and outerHTML are not defined on DocumentFragment.
DocumentFragment | $frag | |
string | $html |
|
static |
Check if the dom-subtree rooted at node has an element with tag name 'tagName' By default, the root node is not checked.
Node | $node | The DOM node whose tree should be checked |
string | $tagName | Tag name to look for |
bool | $checkRoot | Should the root be checked? |
|
static |
This is a simplified version of the DOMTraverser.
Consider using that before making this more complex.
FIXME: Move to DOMTraverser OR create a new class?
Node | $node | |
callable | $handler | |
mixed | ...$args |