RemexHtml
Fast HTML 5 parser
Loading...
Searching...
No Matches
Wikimedia\RemexHtml\TreeBuilder\Dispatcher Class Reference

This is the approximate equivalent of the "tree construction dispatcher" in the spec. More...

+ Inheritance diagram for Wikimedia\RemexHtml\TreeBuilder\Dispatcher:

Public Member Functions

 __construct (TreeBuilder $builder)
 
InsertionMode switchMode ( $mode)
 Switch the insertion mode, and return the new handler.
 
InsertionMode switchAndSave ( $mode)
 Let the original insertion mode be the current insertion mode, and switch the insertion mode to some new value.
 
InsertionMode restoreMode ()
 Switch the insertion mode to the original insertion mode and return the new handler.
 
InsertionMode getHandler ()
 Get the handler for the current insertion mode in HTML content.
 
bool isInTableMode ()
 True if we are in a table mode, for the purposes of switching to IN_SELECT_IN_TABLE as opposed to IN_SELECT.
 
 flushTableText ()
 If the insertion mode is "in table text", flush the pending table text.
 
InsertionMode reset ()
 Reset the insertion mode appropriately, and return the new handler.
 
 startDocument (Tokenizer $tokenizer, $namespace, $name)
 Called once at the start of the document (STATE_START)
 
 endDocument ( $pos)
 Called when the end of the input string is consumed.
Parameters
int$posThe input position (past the end)
PhanTypeMismatchPropertyProbablyReal Clears references to null
 
 error ( $text, $pos)
 This is called for "parse errors" (as defined by the spec).
 
 characters ( $text, $start, $length, $sourceStart, $sourceLength)
 A merged sequence of character tokens.
 
 startTag ( $name, Attributes $attrs, $selfClose, $sourceStart, $sourceLength)
 A start tag event.
 
 endTag ( $name, $sourceStart, $sourceLength)
 An end tag event.
 
 doctype ( $name, $public, $system, $quirks, $sourceStart, $sourceLength)
 A DOCTYPE declaration.
 
 comment ( $text, $sourceStart, $sourceLength)
 A comment.
 

Public Attributes

const INITIAL = 1
 The insertion mode indexes.
 
const BEFORE_HTML = 2
 
const BEFORE_HEAD = 3
 
const IN_HEAD = 4
 
const IN_HEAD_NOSCRIPT = 5
 
const AFTER_HEAD = 6
 
const IN_BODY = 7
 
const TEXT = 8
 
const IN_TABLE = 9
 
const IN_TABLE_TEXT = 10
 
const IN_CAPTION = 11
 
const IN_COLUMN_GROUP = 12
 
const IN_TABLE_BODY = 13
 
const IN_ROW = 14
 
const IN_CELL = 15
 
const IN_SELECT = 16
 
const IN_SELECT_IN_TABLE = 17
 
const IN_TEMPLATE = 18
 
const AFTER_BODY = 19
 
const IN_FRAMESET = 20
 
const AFTER_FRAMESET = 21
 
const AFTER_AFTER_BODY = 22
 
const AFTER_AFTER_FRAMESET = 23
 
const IN_FOREIGN_CONTENT = 24
 
const IN_PRE = 25
 
const IN_TEXTAREA = 26
 
InHead $inHead
 
InBody $inBody
 
InTable $inTable
 
InSelect $inSelect
 
InTemplate $inTemplate
 
InForeignContent $inForeign
 
bool null $ack
 The insertion mode sets this to true to acknowledge the tag's self-closing flag.
 
TemplateModeStack $templateModeStack
 The stack of template insertion modes.
 

Protected Member Functions

int getAppropriateMode ()
 Get the insertion mode index which is switched to when we reset the insertion mode appropriately.
 
Element null dispatcherCurrentNode ()
 If the stack of open elements is empty, return null, otherwise return the adjusted current node.
 

Protected Attributes

TreeBuilder $builder
 
InsertionMode $handler
 The InsertionMode object for the current insertion mode in HTML content.
 
InsertionMode[] $dispatchTable
 An array mapping insertion mode indexes to InsertionMode objects.
 
int $mode
 The insertion mode index.
 
int $originalMode
 The "original insertion mode" index.
 

Static Protected Attributes

static array $handlerClasses
 The handler class for each insertion mode.
 

Detailed Description

This is the approximate equivalent of the "tree construction dispatcher" in the spec.

It receives token events and distributes them to the appropriate insertion mode class. It also implements some things specific to the dispatcher state:

  • "Reset the insertion mode appropriately"
  • The stack of template insertion modes
  • The "original insertion mode"

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::__construct ( TreeBuilder $builder)
Parameters
TreeBuilder$builder

Member Function Documentation

◆ characters()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::characters ( $text,
$start,
$length,
$sourceStart,
$sourceLength )

A merged sequence of character tokens.

We use the SAX-like convention of requiring the handler to do the substring operation, i.e. the actual text is substr( $text, $start, $length ), since this allows us to avoid some copying, at least if ignoreCharRefs and ignoreNulls are enabled.

Parameters
string$textThe string which contains the emitted characters
int$startThe start of the range within $text to use
int$lengthThe length of the range within $text to use
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ comment()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::comment ( $text,
$sourceStart,
$sourceLength )

A comment.

Parameters
string$textThe inner text of the comment
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ dispatcherCurrentNode()

Element null Wikimedia\RemexHtml\TreeBuilder\Dispatcher::dispatcherCurrentNode ( )
protected

If the stack of open elements is empty, return null, otherwise return the adjusted current node.

Returns
Element|null

◆ doctype()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::doctype ( $name,
$public,
$system,
$quirks,
$sourceStart,
$sourceLength )

A DOCTYPE declaration.

Parameters
string | null$nameThe DOCTYPE name, or null if none was found
string | null$publicThe public identifier, or null if none was found
string | null$systemThe system identifier, or null if none was found
bool$quirksWhat the spec calls the "force-quirks flag"
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ endDocument()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::endDocument ( $pos)

Called when the end of the input string is consumed.

Parameters
int$posThe input position (past the end)
PhanTypeMismatchPropertyProbablyReal Clears references to null

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ endTag()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::endTag ( $name,
$sourceStart,
$sourceLength )

An end tag event.

Parameters
string$nameThe tag name
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ error()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::error ( $text,
$pos )

This is called for "parse errors" (as defined by the spec).

The spec does not define names for error messages, so we just use some English text for now. The imagined audience is a developer reading validator output.

Parameters
string$textThe error message
int$posThe input position

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ flushTableText()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::flushTableText ( )

If the insertion mode is "in table text", flush the pending table text.

This is a facility allowing users to insert into the DOM more cleanly.

◆ getAppropriateMode()

int Wikimedia\RemexHtml\TreeBuilder\Dispatcher::getAppropriateMode ( )
protected

Get the insertion mode index which is switched to when we reset the insertion mode appropriately.

Returns
int

◆ getHandler()

InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::getHandler ( )

Get the handler for the current insertion mode in HTML content.

This is used by the "in foreign" handler to execute the HTML insertion mode. It does not necessarily correspond to the handler currently being executed.

Returns
InsertionMode

◆ isInTableMode()

bool Wikimedia\RemexHtml\TreeBuilder\Dispatcher::isInTableMode ( )

True if we are in a table mode, for the purposes of switching to IN_SELECT_IN_TABLE as opposed to IN_SELECT.

Returns
bool

◆ reset()

InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::reset ( )

Reset the insertion mode appropriately, and return the new handler.

Returns
InsertionMode

◆ restoreMode()

InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::restoreMode ( )

Switch the insertion mode to the original insertion mode and return the new handler.

Returns
InsertionMode

◆ startDocument()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::startDocument ( Tokenizer $tokenizer,
$fragmentNamespace,
$fragmentName )

Called once at the start of the document (STATE_START)

Parameters
Tokenizer$tokenizerThe Tokenizer which generated the event
string | null$fragmentNamespaceThe fragment namespace, or null to run in document mode.
string | null$fragmentNameThe fragment tag name, or null to run in document mode.

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ startTag()

Wikimedia\RemexHtml\TreeBuilder\Dispatcher::startTag ( $name,
Attributes $attrs,
$selfClose,
$sourceStart,
$sourceLength )

A start tag event.

We call it a tag rather than an element since the start/end events are not balanced, so the relationship between tags and elements is complex. Errors emitted by attribute parsing will be not be received until $attrs is accessed by the handler.

Parameters
string$nameThe tag name
Attributes$attrsThe tag attributes
bool$selfCloseWhether there is a self-closing slash
int$sourceStartThe input position
int$sourceLengthThe input length

Implements Wikimedia\RemexHtml\Tokenizer\TokenHandler.

◆ switchAndSave()

InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::switchAndSave ( $mode)

Let the original insertion mode be the current insertion mode, and switch the insertion mode to some new value.

Return the new handler.

Parameters
int$mode
Returns
InsertionMode

◆ switchMode()

InsertionMode Wikimedia\RemexHtml\TreeBuilder\Dispatcher::switchMode ( $mode)

Switch the insertion mode, and return the new handler.

Parameters
int$mode
Returns
InsertionMode

Member Data Documentation

◆ $builder

TreeBuilder Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$builder
protected

◆ $handlerClasses

array Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$handlerClasses
staticprotected
Initial value:
= array(
self::INITIAL => Initial::class,
self::BEFORE_HTML => BeforeHtml::class,
self::BEFORE_HEAD => BeforeHead::class,
self::IN_HEAD => InHead::class,
self::IN_HEAD_NOSCRIPT => InHeadNoscript::class,
self::AFTER_HEAD => AfterHead::class,
self::IN_BODY => InBody::class,
self::TEXT => Text::class,
self::IN_TABLE => InTable::class,
self::IN_TABLE_TEXT => InTableText::class,
self::IN_CAPTION => InCaption::class,
self::IN_COLUMN_GROUP => InColumnGroup::class,
self::IN_TABLE_BODY => InTableBody::class,
self::IN_ROW => InRow::class,
self::IN_CELL => InCell::class,
self::IN_SELECT => InSelect::class,
self::IN_SELECT_IN_TABLE => InSelectInTable::class,
self::IN_TEMPLATE => InTemplate::class,
self::AFTER_BODY => AfterBody::class,
self::IN_FRAMESET => InFrameset::class,
self::AFTER_FRAMESET => AfterFrameset::class,
self::AFTER_AFTER_BODY => AfterAfterBody::class,
self::AFTER_AFTER_FRAMESET => AfterAfterFrameset::class,
self::IN_FOREIGN_CONTENT => InForeignContent::class,
self::IN_PRE => InPre::class,
self::IN_TEXTAREA => InTextarea::class,
)

The handler class for each insertion mode.

◆ $inBody

InBody Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inBody

◆ $inForeign

InForeignContent Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inForeign

◆ $inHead

InHead Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inHead

◆ $inSelect

InSelect Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inSelect

◆ $inTable

InTable Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inTable

◆ $inTemplate

InTemplate Wikimedia\RemexHtml\TreeBuilder\Dispatcher::$inTemplate

The documentation for this class was generated from the following file: