RemexHtml
Fast HTML 5 parser
Loading...
Searching...
No Matches
Wikimedia\RemexHtml\Serializer\Serializer Class Reference

A TreeHandler which builds a serialized representation of a document, by encoding elements when the end tags are seen. More...

+ Inheritance diagram for Wikimedia\RemexHtml\Serializer\Serializer:

Public Member Functions

 __construct (Formatter $formatter, $errorCallback=null)
 Constructor.
 
string getResult ()
 Get the final string.
 
SerializerNode getRootNode ()
 Get the root SerializerNode.
 
SerializerNode getParentNode (SerializerNode $node)
 Get the parent SerializerNode of a given SerializerNode.
 
SerializerNode string null getLastChild (SerializerNode $node)
 Get the last child of a given SerializerNode.
 
 startDocument ( $fragmentNamespace, $fragmentName)
 Called when parsing starts.
 
 endDocument ( $pos)
 Called when parsing stops.
 
 characters ( $preposition, $refElement, $text, $start, $length, $sourceStart, $sourceLength)
 Insert characters.
 
 endTag (Element $element, $sourceStart, $sourceLength)
 A hint that an element was closed and was removed from the stack of open elements.
 
 doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength)
 A valid DOCTYPE token was found.
 
 comment ( $preposition, $refElement, $text, $sourceStart, $sourceLength)
 Insert a comment.
 
 error ( $text, $pos)
 A parse error.
 
 mergeAttributes (Element $element, Attributes $attrs, $sourceStart)
 Add attributes to an existing element.
 
 removeNode (Element $element, $sourceStart)
 Remove a node from the tree, and all its children.
 
 reparentChildren (Element $element, Element $newParent, $sourceStart)
 Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process.
 
string dump ()
 Get a text representation of the current state of the serializer, for debugging.
 
- Public Member Functions inherited from Wikimedia\RemexHtml\TreeBuilder\TreeHandler
 insertElement ( $preposition, $ref, Element $element, $void, $sourceStart, $sourceLength)
 Insert an element.
 

Protected Member Functions

 interpretPlacement ( $preposition, $refElement)
 

Protected Attributes

SerializerNode[] $nodes = array( )
 All active SerializerNode objects in an array, so that they can be referred to by integer indexes.
 

Detailed Description

A TreeHandler which builds a serialized representation of a document, by encoding elements when the end tags are seen.

This is faster than building a DOM and then serializing it, even if you use DOMDocument::saveHTML().

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\RemexHtml\Serializer\Serializer::__construct ( Formatter $formatter,
$errorCallback = null )

Constructor.

Parameters
Formatter$formatter
callable | null$errorCallbackA function which is called with the details of each parse error

Member Function Documentation

◆ characters()

Wikimedia\RemexHtml\Serializer\Serializer::characters ( $preposition,
$ref,
$text,
$start,
$length,
$sourceStart,
$sourceLength )

Insert characters.

Parameters
int$prepositionThe placement of the new node with respect to $ref. May be TreeBuilder::
  • BEFORE: insert as a sibling before the reference element
  • UNDER: append as the last child of the reference element
  • ROOT: append as the last child of the document node
Element | null$refInsert before/below this element, or null if $preposition is ROOT.
string$textThe text to insert is a substring of this string, with the start and length of the substring given by $start and $length. We do it this way to avoid unnecessary copying.
int$startThe start of the substring
int$lengthThe length of the substring
int$sourceStartThe input position. This is not necessarily accurate, particularly when the tokenizer is run without ignoreEntities, or in CDATA sections.
int$sourceLengthThe length of the input which is consumed. The same caveats apply as for $sourceStart.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ comment()

Wikimedia\RemexHtml\Serializer\Serializer::comment ( $preposition,
$ref,
$text,
$sourceStart,
$sourceLength )

Insert a comment.

Parameters
int$prepositionThe placement of the new node with respect to $ref. May be TreeBuilder::
  • BEFORE: insert as a sibling before the reference element
  • UNDER: append as the last child of the reference element
  • ROOT: append as the last child of the document node
Element | null$refInsert before/below this element, or null if $preposition is ROOT.
string$textThe text of the comment
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ doctype()

Wikimedia\RemexHtml\Serializer\Serializer::doctype ( $name,
$public,
$system,
$quirks,
$sourceStart,
$sourceLength )

A valid DOCTYPE token was found.

Parameters
string$nameThe doctype name, usually "html"
string$publicThe PUBLIC identifier
string$systemThe SYSTEM identifier
int$quirksThe quirks mode implied from the doctype. One of:
  • TreeBuilder::NO_QUIRKS : no quirks
  • TreeBuilder::LIMITED_QUIRKS : limited quirks
  • TreeBuilder::QUIRKS : full quirks
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ dump()

string Wikimedia\RemexHtml\Serializer\Serializer::dump ( )

Get a text representation of the current state of the serializer, for debugging.

Returns
string

◆ endDocument()

Wikimedia\RemexHtml\Serializer\Serializer::endDocument ( $pos)

Called when parsing stops.

Parameters
int$posThe input string length, i.e. the past-the-end position.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ endTag()

Wikimedia\RemexHtml\Serializer\Serializer::endTag ( Element $element,
$sourceStart,
$sourceLength )

A hint that an element was closed and was removed from the stack of open elements.

It probably won't be mutated again.

Parameters
Element$elementThe element being ended
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ error()

Wikimedia\RemexHtml\Serializer\Serializer::error ( $text,
$pos )

A parse error.

Parameters
string$textAn error message explaining in English what the author did wrong, and what the parser intends to do about the situation.
int$posThe input position at which the error occurred

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ getLastChild()

SerializerNode string null Wikimedia\RemexHtml\Serializer\Serializer::getLastChild ( SerializerNode $node)

Get the last child of a given SerializerNode.

Parameters
SerializerNode$node
Returns
SerializerNode|string|null

◆ getParentNode()

SerializerNode Wikimedia\RemexHtml\Serializer\Serializer::getParentNode ( SerializerNode $node)

Get the parent SerializerNode of a given SerializerNode.

Parameters
SerializerNode$node
Returns
SerializerNode

◆ getResult()

string Wikimedia\RemexHtml\Serializer\Serializer::getResult ( )

Get the final string.

This can only be called after endDocument() is received.

Returns
string

Implements Wikimedia\RemexHtml\Serializer\AbstractSerializer.

◆ getRootNode()

SerializerNode Wikimedia\RemexHtml\Serializer\Serializer::getRootNode ( )

Get the root SerializerNode.

Returns
SerializerNode

◆ mergeAttributes()

Wikimedia\RemexHtml\Serializer\Serializer::mergeAttributes ( Element $element,
Attributes $attrs,
$sourceStart )

Add attributes to an existing element.

This is used to update the attributes of the <html> or <body> elements. The event receiver should add only those attributes which the original element does not already have. It should not overwrite existing attributes.

Parameters
Element$elementThe element to update
Attributes$attrsThe new attributes to add
int$sourceStartThe input position

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ removeNode()

Wikimedia\RemexHtml\Serializer\Serializer::removeNode ( Element $element,
$sourceStart )

Remove a node from the tree, and all its children.

This is only done when a <frameset> element is found, which triggers removal of the partially-constructed body element.

Parameters
Element$elementThe element to remove
int$sourceStartThe location in the source at which this action was triggered.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ reparentChildren()

Wikimedia\RemexHtml\Serializer\Serializer::reparentChildren ( Element $element,
Element $newParent,
$sourceStart )

Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process.

Insert $newParent as now the only child of $element.

Parameters
Element$elementThe old parent element
Element$newParentThe new parent element
int$sourceStartThe location in the source at which this action was triggered.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

◆ startDocument()

Wikimedia\RemexHtml\Serializer\Serializer::startDocument ( $fragmentNamespace,
$fragmentName )

Called when parsing starts.

Parameters
string | null$fragmentNamespaceThe fragment namespace, or null to run in document mode.
string | null$fragmentNameThe fragment tag name, or null to run in document mode.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Reimplemented in Wikimedia\RemexHtml\Serializer\SerializerWithTracer.

Member Data Documentation

◆ $nodes

SerializerNode [] Wikimedia\RemexHtml\Serializer\Serializer::$nodes = array( )
protected

All active SerializerNode objects in an array, so that they can be referred to by integer indexes.

This is a way to emulate weak references, to avoid circular references, allowing nodes to be freed.


The documentation for this class was generated from the following file: