RemexHtml
Fast HTML 5 parser
|
A TreeHandler which builds a serialized representation of a document, by encoding elements when the end tags are seen. More...
Public Member Functions | |
__construct (Formatter $formatter, $errorCallback=null) | |
Constructor. | |
string | getResult () |
Get the final string. | |
SerializerNode | getRootNode () |
Get the root SerializerNode. | |
SerializerNode | getParentNode (SerializerNode $node) |
Get the parent SerializerNode of a given SerializerNode. | |
SerializerNode string null | getLastChild (SerializerNode $node) |
Get the last child of a given SerializerNode. | |
startDocument ( $fragmentNamespace, $fragmentName) | |
Called when parsing starts. | |
endDocument ( $pos) | |
Called when parsing stops. | |
characters ( $preposition, $refElement, $text, $start, $length, $sourceStart, $sourceLength) | |
Insert characters. | |
endTag (Element $element, $sourceStart, $sourceLength) | |
A hint that an element was closed and was removed from the stack of open elements. | |
doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength) | |
A valid DOCTYPE token was found. | |
comment ( $preposition, $refElement, $text, $sourceStart, $sourceLength) | |
Insert a comment. | |
error ( $text, $pos) | |
A parse error. | |
mergeAttributes (Element $element, Attributes $attrs, $sourceStart) | |
Add attributes to an existing element. | |
removeNode (Element $element, $sourceStart) | |
Remove a node from the tree, and all its children. | |
reparentChildren (Element $element, Element $newParent, $sourceStart) | |
Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process. | |
string | dump () |
Get a text representation of the current state of the serializer, for debugging. | |
Public Member Functions inherited from Wikimedia\RemexHtml\TreeBuilder\TreeHandler | |
insertElement ( $preposition, $ref, Element $element, $void, $sourceStart, $sourceLength) | |
Insert an element. | |
Protected Member Functions | |
interpretPlacement ( $preposition, $refElement) | |
Protected Attributes | |
SerializerNode[] | $nodes = array( ) |
All active SerializerNode objects in an array, so that they can be referred to by integer indexes. | |
A TreeHandler which builds a serialized representation of a document, by encoding elements when the end tags are seen.
This is faster than building a DOM and then serializing it, even if you use DOMDocument::saveHTML().
Wikimedia\RemexHtml\Serializer\Serializer::__construct | ( | Formatter | $formatter, |
$errorCallback = null ) |
Constructor.
Formatter | $formatter | |
callable | null | $errorCallback | A function which is called with the details of each parse error |
Wikimedia\RemexHtml\Serializer\Serializer::characters | ( | $preposition, | |
$ref, | |||
$text, | |||
$start, | |||
$length, | |||
$sourceStart, | |||
$sourceLength ) |
Insert characters.
int | $preposition | The placement of the new node with respect to $ref. May be TreeBuilder::
|
Element | null | $ref | Insert before/below this element, or null if $preposition is ROOT. |
string | $text | The text to insert is a substring of this string, with the start and length of the substring given by $start and $length. We do it this way to avoid unnecessary copying. |
int | $start | The start of the substring |
int | $length | The length of the substring |
int | $sourceStart | The input position. This is not necessarily accurate, particularly when the tokenizer is run without ignoreEntities, or in CDATA sections. |
int | $sourceLength | The length of the input which is consumed. The same caveats apply as for $sourceStart. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::comment | ( | $preposition, | |
$ref, | |||
$text, | |||
$sourceStart, | |||
$sourceLength ) |
Insert a comment.
int | $preposition | The placement of the new node with respect to $ref. May be TreeBuilder::
|
Element | null | $ref | Insert before/below this element, or null if $preposition is ROOT. |
string | $text | The text of the comment |
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::doctype | ( | $name, | |
$public, | |||
$system, | |||
$quirks, | |||
$sourceStart, | |||
$sourceLength ) |
A valid DOCTYPE token was found.
string | $name | The doctype name, usually "html" |
string | $public | The PUBLIC identifier |
string | $system | The SYSTEM identifier |
int | $quirks | The quirks mode implied from the doctype. One of:
|
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
string Wikimedia\RemexHtml\Serializer\Serializer::dump | ( | ) |
Get a text representation of the current state of the serializer, for debugging.
Wikimedia\RemexHtml\Serializer\Serializer::endDocument | ( | $pos | ) |
Called when parsing stops.
int | $pos | The input string length, i.e. the past-the-end position. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::endTag | ( | Element | $element, |
$sourceStart, | |||
$sourceLength ) |
A hint that an element was closed and was removed from the stack of open elements.
It probably won't be mutated again.
Element | $element | The element being ended |
int | $sourceStart | The input position |
int | $sourceLength | The length of the input which is consumed |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::error | ( | $text, | |
$pos ) |
A parse error.
string | $text | An error message explaining in English what the author did wrong, and what the parser intends to do about the situation. |
int | $pos | The input position at which the error occurred |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
SerializerNode string null Wikimedia\RemexHtml\Serializer\Serializer::getLastChild | ( | SerializerNode | $node | ) |
Get the last child of a given SerializerNode.
SerializerNode | $node |
SerializerNode Wikimedia\RemexHtml\Serializer\Serializer::getParentNode | ( | SerializerNode | $node | ) |
Get the parent SerializerNode of a given SerializerNode.
SerializerNode | $node |
string Wikimedia\RemexHtml\Serializer\Serializer::getResult | ( | ) |
Get the final string.
This can only be called after endDocument() is received.
Implements Wikimedia\RemexHtml\Serializer\AbstractSerializer.
SerializerNode Wikimedia\RemexHtml\Serializer\Serializer::getRootNode | ( | ) |
Get the root SerializerNode.
Wikimedia\RemexHtml\Serializer\Serializer::mergeAttributes | ( | Element | $element, |
Attributes | $attrs, | ||
$sourceStart ) |
Add attributes to an existing element.
This is used to update the attributes of the <html> or <body> elements. The event receiver should add only those attributes which the original element does not already have. It should not overwrite existing attributes.
Element | $element | The element to update |
Attributes | $attrs | The new attributes to add |
int | $sourceStart | The input position |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::removeNode | ( | Element | $element, |
$sourceStart ) |
Remove a node from the tree, and all its children.
This is only done when a <frameset> element is found, which triggers removal of the partially-constructed body element.
Element | $element | The element to remove |
int | $sourceStart | The location in the source at which this action was triggered. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::reparentChildren | ( | Element | $element, |
Element | $newParent, | ||
$sourceStart ) |
Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process.
Insert $newParent as now the only child of $element.
Element | $element | The old parent element |
Element | $newParent | The new parent element |
int | $sourceStart | The location in the source at which this action was triggered. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
Wikimedia\RemexHtml\Serializer\Serializer::startDocument | ( | $fragmentNamespace, | |
$fragmentName ) |
Called when parsing starts.
string | null | $fragmentNamespace | The fragment namespace, or null to run in document mode. |
string | null | $fragmentName | The fragment tag name, or null to run in document mode. |
Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.
Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.
|
protected |
All active SerializerNode objects in an array, so that they can be referred to by integer indexes.
This is a way to emulate weak references, to avoid circular references, allowing nodes to be freed.