RemexHtml
Fast HTML 5 parser
|
This is the interface for handlers receiving events from the Tokenizer. More...
Public Member Functions | |
startDocument (Tokenizer $tokenizer, $fragmentNamespace, $fragmentName) | |
Called once at the start of the document (STATE_START) | |
endDocument ( $pos) | |
Called when the end of the input string is consumed. | |
error ( $text, $pos) | |
This is called for "parse errors" (as defined by the spec). | |
characters ( $text, $start, $length, $sourceStart, $sourceLength) | |
A merged sequence of character tokens. | |
startTag ( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength) | |
A start tag event. | |
endTag ( $name, $sourceStart, $sourceLength) | |
An end tag event. | |
doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength) | |
A DOCTYPE declaration. | |
comment ( $text, $sourceStart, $sourceLength) | |
A comment. | |
This is the interface for handlers receiving events from the Tokenizer.
All events which consume characters give a source offset and length, allowing for input stream patching. The offset and length are relative to the preprocessed input, see Tokenizer::getPreprocessd
Wikimedia\RemexHtml\Tokenizer\TokenHandler::characters | ( | $text, | |
$start, | |||
$length, | |||
$sourceStart, | |||
$sourceLength ) |
A merged sequence of character tokens.
We use the SAX-like convention of requiring the handler to do the substring operation, i.e. the actual text is substr( $text, $start, $length ), since this allows us to avoid some copying, at least if ignoreCharRefs and ignoreNulls are enabled.
string | $text | The string which contains the emitted characters |
int | $start | The start of the range within $text to use |
int | $length | The length of the range within $text to use |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::comment | ( | $text, | |
$sourceStart, | |||
$sourceLength ) |
A comment.
string | $text | The inner text of the comment |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::doctype | ( | $name, | |
$public, | |||
$system, | |||
$quirks, | |||
$sourceStart, | |||
$sourceLength ) |
A DOCTYPE declaration.
string | null | $name | The DOCTYPE name, or null if none was found |
string | null | $public | The public identifier, or null if none was found |
string | null | $system | The system identifier, or null if none was found |
bool | $quirks | What the spec calls the "force-quirks flag" |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::endDocument | ( | $pos | ) |
Called when the end of the input string is consumed.
int | $pos | The input position (past the end) |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::endTag | ( | $name, | |
$sourceStart, | |||
$sourceLength ) |
An end tag event.
string | $name | The tag name |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::error | ( | $text, | |
$pos ) |
This is called for "parse errors" (as defined by the spec).
The spec does not define names for error messages, so we just use some English text for now. The imagined audience is a developer reading validator output.
string | $text | The error message |
int | $pos | The input position |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::startDocument | ( | Tokenizer | $tokenizer, |
$fragmentNamespace, | |||
$fragmentName ) |
Called once at the start of the document (STATE_START)
Tokenizer | $tokenizer | The Tokenizer which generated the event |
string | null | $fragmentNamespace | The fragment namespace, or null to run in document mode. |
string | null | $fragmentName | The fragment tag name, or null to run in document mode. |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.
Wikimedia\RemexHtml\Tokenizer\TokenHandler::startTag | ( | $name, | |
Attributes | $attrs, | ||
$selfClose, | |||
$sourceStart, | |||
$sourceLength ) |
A start tag event.
We call it a tag rather than an element since the start/end events are not balanced, so the relationship between tags and elements is complex. Errors emitted by attribute parsing will be not be received until $attrs is accessed by the handler.
string | $name | The tag name |
Attributes | $attrs | The tag attributes |
bool | $selfClose | Whether there is a self-closing slash |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implemented in Wikimedia\RemexHtml\Tokenizer\NullTokenHandler, Wikimedia\RemexHtml\Tokenizer\RelayTokenHandler, Wikimedia\RemexHtml\Tokenizer\TestTokenHandler, Wikimedia\RemexHtml\Tokenizer\TokenGeneratorHandler, Wikimedia\RemexHtml\Tokenizer\TokenSerializer, Wikimedia\RemexHtml\TreeBuilder\Dispatcher, and Wikimedia\RemexHtml\TreeBuilder\DispatchTracer.