RemexHtml
Fast HTML 5 parser
|
A TreeHandler which constructs a DOMDocument. More...
Public Member Functions | |
__construct ( $options=[]) | |
DOMNode | getFragment () |
Get the constructed document or document fragment. | |
bool | isCoerced () |
Returns true if the document was coerced due to libxml limitations. | |
startDocument ( $fragmentNamespace, $fragmentName) | |
Called when parsing starts. | |
endDocument ( $pos) | |
Called when parsing stops. | |
insertElement ( $preposition, $refElement, Element $element, $void, $sourceStart, $sourceLength) | |
Insert an element. | |
endTag (Element $element, $sourceStart, $sourceLength) | |
A hint that an element was closed and was removed from the stack of open elements. | |
doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength) | |
A valid DOCTYPE token was found. | |
comment ( $preposition, $refElement, $text, $sourceStart, $sourceLength) | |
Insert a comment. | |
error ( $text, $pos) | |
A parse error. | |
removeNode (Element $element, $sourceStart) | |
Remove a node from the tree, and all its children. | |
Public Member Functions inherited from Wikimedia\RemexHtml\TreeBuilder\TreeHandler | |
characters ( $preposition, $ref, $text, $start, $length, $sourceStart, $sourceLength) | |
Insert characters. | |
mergeAttributes (Element $element, Attributes $attrs, $sourceStart) | |
Add attributes to an existing element. | |
reparentChildren (Element $element, Element $newParent, $sourceStart) | |
Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process. | |
Public Attributes | |
string null | $doctypeName |
The name of the input document type *. | |
string null | $public |
The public ID *. | |
string null | $system |
The system ID *. | |
int | $quirks |
The quirks mode. | |
Protected Member Functions | |
DOMDocument | createDocument (string $doctypeName=null, string $public=null, string $system=null) |
insertNode ( $preposition, $refElement, $node) | |
A TreeHandler which constructs a DOMDocument.
Note that this class permits third-party DOMImplementation
s (documents other than \DOMDocument
, nodes other than \DOMNode
, etc) and so no enforced PHP type hints are used which name these classes directly. For the sake of static type checking, the types in comments are given as if the standard PHP \DOM*
classes are being used but at runtime everything is duck-typed.
Wikimedia\RemexHtml\DOM\DOMBuilder::__construct | ( | $options = [] | ) |
array | $options | An associative array of options:
|
Wikimedia\RemexHtml\DOM\DOMBuilder::comment | ( | $preposition, | |
$ref, | |||
$text, | |||
$sourceStart, | |||
$sourceLength ) |
Insert a comment.
int | $preposition | The placement of the new node with respect to $ref. May be TreeBuilder::
|
Element | null | $ref | Insert before/below this element, or null if $preposition is ROOT. |
string | $text | The text of the comment |
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
|
protected |
string | null | $doctypeName | |
string | null | $public | |
string | null | $system |
Wikimedia\RemexHtml\DOM\DOMBuilder::doctype | ( | $name, | |
$public, | |||
$system, | |||
$quirks, | |||
$sourceStart, | |||
$sourceLength ) |
A valid DOCTYPE token was found.
string | $name | The doctype name, usually "html" |
string | $public | The PUBLIC identifier |
string | $system | The SYSTEM identifier |
int | $quirks | The quirks mode implied from the doctype. One of:
|
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Wikimedia\RemexHtml\DOM\DOMBuilder::endDocument | ( | $pos | ) |
Called when parsing stops.
int | $pos | The input string length, i.e. the past-the-end position. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Wikimedia\RemexHtml\DOM\DOMBuilder::endTag | ( | Element | $element, |
$sourceStart, | |||
$sourceLength ) |
A hint that an element was closed and was removed from the stack of open elements.
It probably won't be mutated again.
Element | $element | The element being ended |
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Wikimedia\RemexHtml\DOM\DOMBuilder::error | ( | $text, | |
$pos ) |
A parse error.
string | $text | An error message explaining in English what the author did wrong, and what the parser intends to do about the situation. |
int | $pos | The input position at which the error occurred |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
DOMNode Wikimedia\RemexHtml\DOM\DOMBuilder::getFragment | ( | ) |
Get the constructed document or document fragment.
In the fragment case, a DOMElement is returned, and the caller is expected to extract its inner contents, ignoring the wrapping element. This convention is convenient because the wrapping element gives libxml somewhere to put its namespace declarations. If we copied the children into a DOMDocumentFragment, libxml would invent new prefixes for the orphaned namespaces.
Wikimedia\RemexHtml\DOM\DOMBuilder::insertElement | ( | $preposition, | |
$ref, | |||
Element | $element, | ||
$void, | |||
$sourceStart, | |||
$sourceLength ) |
Insert an element.
The element name and attributes are given in the supplied Element object. Handlers for this event typically attach an identifier to the userData property of the Element object, to identify the element when it is used again in subsequent tree mutations.
int | $preposition | The placement of the new node with respect to $ref. May be TreeBuilder::
|
Element | null | $ref | Insert before/below this element, or null if $preposition is ROOT. |
Element | $element | An object containing information about the new element. The same object will be used for $parent and $refNode in other calls as appropriate. The handler can set $element->userData to attach a suitable DOM object to identify the mutation target in subsequent calls. |
bool | $void | True if this is a void element which cannot have any children appended to it. This is usually true if the element is closed by the same token that opened it. No endTag() event will be sent for such an element. This is only true if self-closing tags are acknowledged for this tag name, so it is a hint to the serializer that a self-closing tag is acceptable. |
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
bool Wikimedia\RemexHtml\DOM\DOMBuilder::isCoerced | ( | ) |
Returns true if the document was coerced due to libxml limitations.
We follow HTML 5.1 ยง 8.2.7 "Coercing an HTML DOM into an infoset".
Wikimedia\RemexHtml\DOM\DOMBuilder::removeNode | ( | Element | $element, |
$sourceStart ) |
Remove a node from the tree, and all its children.
This is only done when a <frameset> element is found, which triggers removal of the partially-constructed body element.
Element | $element | The element to remove |
int | $sourceStart | The location in the source at which this action was triggered. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Wikimedia\RemexHtml\DOM\DOMBuilder::startDocument | ( | $fragmentNamespace, | |
$fragmentName ) |
Called when parsing starts.
string | null | $fragmentNamespace | The fragment namespace, or null to run in document mode. |
string | null | $fragmentName | The fragment tag name, or null to run in document mode. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
int Wikimedia\RemexHtml\DOM\DOMBuilder::$quirks |
The quirks mode.
May be either TreeBuilder::NO_QUIRKS, TreeBuilder::LIMITED_QUIRKS or TreeBuilder::QUIRKS to indicate no-quirks mode, limited-quirks mode or quirks mode respectively.