RemexHtml
Fast HTML 5 parser
|
This is the approximate equivalent of the "tree construction dispatcher" in the spec. More...
Public Member Functions | ||||
__construct (TreeBuilder $builder) | ||||
InsertionMode | switchMode ( $mode) | |||
Switch the insertion mode, and return the new handler. | ||||
InsertionMode | switchAndSave ( $mode) | |||
Let the original insertion mode be the current insertion mode, and switch the insertion mode to some new value. | ||||
InsertionMode | restoreMode () | |||
Switch the insertion mode to the original insertion mode and return the new handler. | ||||
InsertionMode | getHandler () | |||
Get the handler for the current insertion mode in HTML content. | ||||
bool | isInTableMode () | |||
True if we are in a table mode, for the purposes of switching to IN_SELECT_IN_TABLE as opposed to IN_SELECT. | ||||
flushTableText () | ||||
If the insertion mode is "in table text", flush the pending table text. | ||||
InsertionMode | reset () | |||
Reset the insertion mode appropriately, and return the new handler. | ||||
startDocument (Tokenizer $tokenizer, $namespace, $name) | ||||
Called once at the start of the document (STATE_START) | ||||
endDocument ( $pos) | ||||
Called when the end of the input string is consumed.
| ||||
error ( $text, $pos) | ||||
This is called for "parse errors" (as defined by the spec). | ||||
characters ( $text, $start, $length, $sourceStart, $sourceLength) | ||||
A merged sequence of character tokens. | ||||
startTag ( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength) | ||||
A start tag event. | ||||
endTag ( $name, $sourceStart, $sourceLength) | ||||
An end tag event. | ||||
doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength) | ||||
A DOCTYPE declaration. | ||||
comment ( $text, $sourceStart, $sourceLength) | ||||
A comment. | ||||
Public Attributes | |
const | INITIAL = 1 |
The insertion mode indexes. | |
const | BEFORE_HTML = 2 |
const | BEFORE_HEAD = 3 |
const | IN_HEAD = 4 |
const | IN_HEAD_NOSCRIPT = 5 |
const | AFTER_HEAD = 6 |
const | IN_BODY = 7 |
const | TEXT = 8 |
const | IN_TABLE = 9 |
const | IN_TABLE_TEXT = 10 |
const | IN_CAPTION = 11 |
const | IN_COLUMN_GROUP = 12 |
const | IN_TABLE_BODY = 13 |
const | IN_ROW = 14 |
const | IN_CELL = 15 |
const | IN_SELECT = 16 |
const | IN_SELECT_IN_TABLE = 17 |
const | IN_TEMPLATE = 18 |
const | AFTER_BODY = 19 |
const | IN_FRAMESET = 20 |
const | AFTER_FRAMESET = 21 |
const | AFTER_AFTER_BODY = 22 |
const | AFTER_AFTER_FRAMESET = 23 |
const | IN_FOREIGN_CONTENT = 24 |
const | IN_PRE = 25 |
const | IN_TEXTAREA = 26 |
InHead | $inHead |
InBody | $inBody |
InTable | $inTable |
InSelect | $inSelect |
InTemplate | $inTemplate |
InForeignContent | $inForeign |
bool null | $ack |
The insertion mode sets this to true to acknowledge the tag's self-closing flag. | |
TemplateModeStack | $templateModeStack |
The stack of template insertion modes. | |
Protected Member Functions | |
int | getAppropriateMode () |
Get the insertion mode index which is switched to when we reset the insertion mode appropriately. | |
Element null | dispatcherCurrentNode () |
If the stack of open elements is empty, return null, otherwise return the adjusted current node. | |
Protected Attributes | |
TreeBuilder | $builder |
InsertionMode | $handler |
The InsertionMode object for the current insertion mode in HTML content. | |
InsertionMode[] | $dispatchTable |
An array mapping insertion mode indexes to InsertionMode objects. | |
int | $mode |
The insertion mode index. | |
int | $originalMode |
The "original insertion mode" index. | |
Static Protected Attributes | |
static array | $handlerClasses |
The handler class for each insertion mode. | |
This is the approximate equivalent of the "tree construction dispatcher" in the spec.
It receives token events and distributes them to the appropriate insertion mode class. It also implements some things specific to the dispatcher state:
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::__construct | ( | TreeBuilder | $builder | ) |
TreeBuilder | $builder |
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::characters | ( | $text, | |
$start, | |||
$length, | |||
$sourceStart, | |||
$sourceLength ) |
A merged sequence of character tokens.
We use the SAX-like convention of requiring the handler to do the substring operation, i.e. the actual text is substr( $text, $start, $length ), since this allows us to avoid some copying, at least if ignoreCharRefs and ignoreNulls are enabled.
string | $text | The string which contains the emitted characters |
int | $start | The start of the range within $text to use |
int | $length | The length of the range within $text to use |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::comment | ( | $text, | |
$sourceStart, | |||
$sourceLength ) |
A comment.
string | $text | The inner text of the comment |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
|
protected |
If the stack of open elements is empty, return null, otherwise return the adjusted current node.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::doctype | ( | $name, | |
$public, | |||
$system, | |||
$quirks, | |||
$sourceStart, | |||
$sourceLength ) |
A DOCTYPE declaration.
string | null | $name | The DOCTYPE name, or null if none was found |
string | null | $public | The public identifier, or null if none was found |
string | null | $system | The system identifier, or null if none was found |
bool | $quirks | What the spec calls the "force-quirks flag" |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::endDocument | ( | $pos | ) |
Called when the end of the input string is consumed.
int | $pos | The input position (past the end) |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::endTag | ( | $name, | |
$sourceStart, | |||
$sourceLength ) |
An end tag event.
string | $name | The tag name |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::error | ( | $text, | |
$pos ) |
This is called for "parse errors" (as defined by the spec).
The spec does not define names for error messages, so we just use some English text for now. The imagined audience is a developer reading validator output.
string | $text | The error message |
int | $pos | The input position |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::flushTableText | ( | ) |
If the insertion mode is "in table text", flush the pending table text.
This is a facility allowing users to insert into the DOM more cleanly.
|
protected |
Get the insertion mode index which is switched to when we reset the insertion mode appropriately.
InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::getHandler | ( | ) |
Get the handler for the current insertion mode in HTML content.
This is used by the "in foreign" handler to execute the HTML insertion mode. It does not necessarily correspond to the handler currently being executed.
bool Wikimedia\RemexHtml\TreeBuilder\Dispatcher::isInTableMode | ( | ) |
True if we are in a table mode, for the purposes of switching to IN_SELECT_IN_TABLE as opposed to IN_SELECT.
InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::reset | ( | ) |
Reset the insertion mode appropriately, and return the new handler.
InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::restoreMode | ( | ) |
Switch the insertion mode to the original insertion mode and return the new handler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::startDocument | ( | Tokenizer | $tokenizer, |
$fragmentNamespace, | |||
$fragmentName ) |
Called once at the start of the document (STATE_START)
Tokenizer | $tokenizer | The Tokenizer which generated the event |
string | null | $fragmentNamespace | The fragment namespace, or null to run in document mode. |
string | null | $fragmentName | The fragment tag name, or null to run in document mode. |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
Wikimedia\RemexHtml\TreeBuilder\Dispatcher::startTag | ( | $name, | |
Attributes | $attrs, | ||
$selfClose, | |||
$sourceStart, | |||
$sourceLength ) |
A start tag event.
We call it a tag rather than an element since the start/end events are not balanced, so the relationship between tags and elements is complex. Errors emitted by attribute parsing will be not be received until $attrs is accessed by the handler.
string | $name | The tag name |
Attributes | $attrs | The tag attributes |
bool | $selfClose | Whether there is a self-closing slash |
int | $sourceStart | The input position |
int | $sourceLength | The input length |
Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.
InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::switchAndSave | ( | $mode | ) |
Let the original insertion mode be the current insertion mode, and switch the insertion mode to some new value.
Return the new handler.
int | $mode |
InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::switchMode | ( | $mode | ) |
Switch the insertion mode, and return the new handler.
int | $mode |
|
protected |
|
staticprotected |
The handler class for each insertion mode.
InBody Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inBody |
InForeignContent Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inForeign |
InHead Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inHead |
InSelect Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inSelect |
InTable Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inTable |
InTemplate Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inTemplate |