RemexHtml
Fast HTML 5 parser
Loading...
Searching...
No Matches
Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler Class Reference

The handler which converts events to tokens arrays for TokenGenerator. More...

+ Inheritance diagram for Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler:

Public Member Functions

 startDocument (Tokenizer $tokenizer, $fragmentNamespace, $fragmentName)
 Called once at the start of the document (STATE_START)
 
 endDocument ( $pos)
 Called when the end of the input string is consumed.
 
 error ( $text, $pos)
 This is called for "parse errors" (as defined by the spec).
 
 characters ( $text, $start, $length, $sourceStart, $sourceLength)
 A merged sequence of character tokens.
 
 startTag ( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength)
 A start tag event.
 
 endTag ( $name, $sourceStart, $sourceLength)
 An end tag event.
 
 doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength)
 A DOCTYPE declaration.
 
 comment ( $text, $sourceStart, $sourceLength)
 A comment.
 

Public Attributes

 $tokens = array( )
 

Detailed Description

The handler which converts events to tokens arrays for TokenGenerator.

Member Function Documentation

◆ characters()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::characters ( $text,
$start,
$length,
$sourceStart,
$sourceLength )

A merged sequence of character tokens.

We use the SAX-like convention of requiring the handler to do the substring operation, i.e. the actual text is substr( $text, $start, $length ), since this allows us to avoid some copying, at least if ignoreCharRefs and ignoreNulls are enabled.

Parameters
string$textThe string which contains the emitted characters
int$startThe start of the range within $text to use
int$lengthThe length of the range within $text to use
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ comment()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::comment ( $text,
$sourceStart,
$sourceLength )

A comment.

Parameters
string$textThe inner text of the comment
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ doctype()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::doctype ( $name,
$public,
$system,
$quirks,
$sourceStart,
$sourceLength )

A DOCTYPE declaration.

Parameters
string | null$nameThe DOCTYPE name, or null if none was found
string | null$publicThe public identifier, or null if none was found
string | null$systemThe system identifier, or null if none was found
bool$quirksWhat the spec calls the "force-quirks flag"
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ endDocument()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::endDocument ( $pos)

Called when the end of the input string is consumed.

Parameters
int$posThe input position (past the end)

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ endTag()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::endTag ( $name,
$sourceStart,
$sourceLength )

An end tag event.

Parameters
string$nameThe tag name
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ error()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::error ( $text,
$pos )

This is called for "parse errors" (as defined by the spec).

The spec does not define names for error messages, so we just use some English text for now. The imagined audience is a developer reading validator output.

Parameters
string$textThe error message
int$posThe input position

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ startDocument()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::startDocument ( Tokenizer $tokenizer,
$fragmentNamespace,
$fragmentName )

Called once at the start of the document (STATE_START)

Parameters
Tokenizer$tokenizerThe Tokenizer which generated the event
string | null$fragmentNamespaceThe fragment namespace, or null to run in document mode.
string | null$fragmentNameThe fragment tag name, or null to run in document mode.

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ startTag()

Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler::startTag ( $name,
Attributes $attrs,
$selfClose,
$sourceStart,
$sourceLength )

A start tag event.

We call it a tag rather than an element since the start/end events are not balanced, so the relationship between tags and elements is complex. Errors emitted by attribute parsing will be not be received until $attrs is accessed by the handler.

Parameters
string$nameThe tag name
Attributes$attrsThe tag attributes
bool$selfCloseWhether there is a self-closing slash
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.


The documentation for this class was generated from the following file: