RemexHtml
Fast HTML 5 parser
Loading...
Searching...
No Matches
Wikimedia\RemexHtml\DOM\DOMBuilder Class Reference

A TreeHandler which constructs a DOMDocument. More...

+ Inheritance diagram for Wikimedia\RemexHtml\DOM\DOMBuilder:

Public Member Functions

 __construct ( $options=[])
 
DOMNode getFragment ()
 Get the constructed document or document fragment.
 
bool isCoerced ()
 Returns true if the document was coerced due to libxml limitations.
 
 startDocument ( $fragmentNamespace, $fragmentName)
 Called when parsing starts.
 
 endDocument ( $pos)
 Called when parsing stops.
 
 insertElement ( $preposition, $refElement, Element $element, $void, $sourceStart, $sourceLength)
 Insert an element.
 
 endTag (Element $element, $sourceStart, $sourceLength)
 A hint that an element was closed and was removed from the stack of open elements.
 
 doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength)
 A valid DOCTYPE token was found.
 
 comment ( $preposition, $refElement, $text, $sourceStart, $sourceLength)
 Insert a comment.
 
 error ( $text, $pos)
 A parse error.
 
 removeNode (Element $element, $sourceStart)
 Remove a node from the tree, and all its children.
 
- Public Member Functions inherited from Wikimedia\RemexHtml\TreeBuilder\TreeHandler
 characters ( $preposition, $ref, $text, $start, $length, $sourceStart, $sourceLength)
 Insert characters.
 
 mergeAttributes (Element $element, Attributes $attrs, $sourceStart)
 Add attributes to an existing element.
 
 reparentChildren (Element $element, Element $newParent, $sourceStart)
 Take all children of a given parent $element, and insert them as children of $newParent, removing them from their original parent in the process.
 

Public Attributes

string null $doctypeName
 The name of the input document type *.
 
string null $public
 The public ID *.
 
string null $system
 The system ID *.
 
int $quirks
 The quirks mode.
 

Protected Member Functions

DOMDocument createDocument (string $doctypeName=null, string $public=null, string $system=null)
 
 insertNode ( $preposition, $refElement, $node)
 

Detailed Description

A TreeHandler which constructs a DOMDocument.

Note that this class permits third-party DOMImplementations (documents other than \DOMDocument, nodes other than \DOMNode, etc) and so no enforced PHP type hints are used which name these classes directly. For the sake of static type checking, the types in comments are given as if the standard PHP \DOM* classes are being used but at runtime everything is duck-typed.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\RemexHtml\DOM\DOMBuilder::__construct ( $options = [])
Parameters
array$optionsAn associative array of options:
  • errorCallback : A function which is called on parse errors
  • suppressHtmlNamespace : omit the namespace when creating HTML elements. False by default.
  • suppressIdAttribute : don't call the nonstandard DOMElement::setIdAttribute() method while constructing elements. False by default (this method is needed for efficient DOMDocument::getElementById() calls). Set to true if you are using a W3C spec-compliant DOMImplementation and wish to avoid nonstandard calls.
  • domImplementation: The DOMImplementation object to use. If this parameter is missing or null, a new DOMImplementation object will be constructed using the domImplementationClass option value. You can use a third-party DOM implementation by passing in an appropriately duck-typed object here.
  • domImplementationClass: The string name of the DOMImplementation class to use. Defaults to \DOMImplementation::class but you can use a third-party DOM implementation by passing an alternative class name here.
  • domExceptionClass: The string name of the DOMException class to use. Defaults to \DOMException::class but you can use a third-party DOM implementation by passing an alternative class name here.

Member Function Documentation

◆ comment()

Wikimedia\RemexHtml\DOM\DOMBuilder::comment ( $preposition,
$ref,
$text,
$sourceStart,
$sourceLength )

Insert a comment.

Parameters
int$prepositionThe placement of the new node with respect to $ref. May be TreeBuilder::
  • BEFORE: insert as a sibling before the reference element
  • UNDER: append as the last child of the reference element
  • ROOT: append as the last child of the document node
Element | null$refInsert before/below this element, or null if $preposition is ROOT.
string$textThe text of the comment
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ createDocument()

DOMDocument Wikimedia\RemexHtml\DOM\DOMBuilder::createDocument ( string $doctypeName = null,
string $public = null,
string $system = null )
protected
Parameters
string | null$doctypeName
string | null$public
string | null$system
Returns
\DOMDocument PhanTypeMismatchArgumentInternalReal Null args to DOMImplementation::createDocument

◆ doctype()

Wikimedia\RemexHtml\DOM\DOMBuilder::doctype ( $name,
$public,
$system,
$quirks,
$sourceStart,
$sourceLength )

A valid DOCTYPE token was found.

Parameters
string$nameThe doctype name, usually "html"
string$publicThe PUBLIC identifier
string$systemThe SYSTEM identifier
int$quirksThe quirks mode implied from the doctype. One of:
  • TreeBuilder::NO_QUIRKS : no quirks
  • TreeBuilder::LIMITED_QUIRKS : limited quirks
  • TreeBuilder::QUIRKS : full quirks
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ endDocument()

Wikimedia\RemexHtml\DOM\DOMBuilder::endDocument ( $pos)

Called when parsing stops.

Parameters
int$posThe input string length, i.e. the past-the-end position.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ endTag()

Wikimedia\RemexHtml\DOM\DOMBuilder::endTag ( Element $element,
$sourceStart,
$sourceLength )

A hint that an element was closed and was removed from the stack of open elements.

It probably won't be mutated again.

Parameters
Element$elementThe element being ended
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ error()

Wikimedia\RemexHtml\DOM\DOMBuilder::error ( $text,
$pos )

A parse error.

Parameters
string$textAn error message explaining in English what the author did wrong, and what the parser intends to do about the situation.
int$posThe input position at which the error occurred

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ getFragment()

DOMNode Wikimedia\RemexHtml\DOM\DOMBuilder::getFragment ( )

Get the constructed document or document fragment.

In the fragment case, a DOMElement is returned, and the caller is expected to extract its inner contents, ignoring the wrapping element. This convention is convenient because the wrapping element gives libxml somewhere to put its namespace declarations. If we copied the children into a DOMDocumentFragment, libxml would invent new prefixes for the orphaned namespaces.

Returns
\DOMNode

◆ insertElement()

Wikimedia\RemexHtml\DOM\DOMBuilder::insertElement ( $preposition,
$ref,
Element $element,
$void,
$sourceStart,
$sourceLength )

Insert an element.

The element name and attributes are given in the supplied Element object. Handlers for this event typically attach an identifier to the userData property of the Element object, to identify the element when it is used again in subsequent tree mutations.

Parameters
int$prepositionThe placement of the new node with respect to $ref. May be TreeBuilder::
  • BEFORE: insert as a sibling before the reference element
  • UNDER: append as the last child of the reference element
  • ROOT: append as the last child of the document node
Element | null$refInsert before/below this element, or null if $preposition is ROOT.
Element$elementAn object containing information about the new element. The same object will be used for $parent and $refNode in other calls as appropriate. The handler can set $element->userData to attach a suitable DOM object to identify the mutation target in subsequent calls.
bool$voidTrue if this is a void element which cannot have any children appended to it. This is usually true if the element is closed by the same token that opened it. No endTag() event will be sent for such an element. This is only true if self-closing tags are acknowledged for this tag name, so it is a hint to the serializer that a self-closing tag is acceptable.
int$sourceStartThe input position
int$sourceLengthThe length of the input which is consumed

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ isCoerced()

bool Wikimedia\RemexHtml\DOM\DOMBuilder::isCoerced ( )

Returns true if the document was coerced due to libxml limitations.

We follow HTML 5.1 ยง 8.2.7 "Coercing an HTML DOM into an infoset".

Returns
bool

◆ removeNode()

Wikimedia\RemexHtml\DOM\DOMBuilder::removeNode ( Element $element,
$sourceStart )

Remove a node from the tree, and all its children.

This is only done when a <frameset> element is found, which triggers removal of the partially-constructed body element.

Parameters
Element$elementThe element to remove
int$sourceStartThe location in the source at which this action was triggered.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

◆ startDocument()

Wikimedia\RemexHtml\DOM\DOMBuilder::startDocument ( $fragmentNamespace,
$fragmentName )

Called when parsing starts.

Parameters
string | null$fragmentNamespaceThe fragment namespace, or null to run in document mode.
string | null$fragmentNameThe fragment tag name, or null to run in document mode.

Implements Wikimedia\RemexHtml\TreeBuilder\TreeHandler.

Member Data Documentation

◆ $quirks

int Wikimedia\RemexHtml\DOM\DOMBuilder::$quirks

The quirks mode.

May be either TreeBuilder::NO_QUIRKS, TreeBuilder::LIMITED_QUIRKS or TreeBuilder::QUIRKS to indicate no-quirks mode, limited-quirks mode or quirks mode respectively.


The documentation for this class was generated from the following file: