Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Wt2Html\PegTokenizer Class Reference
+ Inheritance diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:
+ Collaboration diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:

Public Member Functions

 __construct (Env $env, array $options=[], string $stageId="", ?PipelineStage $prevStage=null)
 
 getOptions ()
 Get the constructor options.
 
 setSourceOffsets (SourceRange $so)
 Set start and end offsets of the source that generated this DOM.
 
 process ( $input, array $opts)
 See PipelineStage::process docs as well.
 
 processChunkily ( $text, array $opts)
 The text is tokenized in chunks (one per top-level block) and registered event listeners are called with the chunk to let it get processed further.
 
 tokenizeSync (string $text, array $args)
 Tokenize via a rule passed in as an arg.
 
 tokenizeAs (string $text, string $rule, bool $sol)
 Tokenizes a string as a rule.
 
 tokenizeURL (string $text)
 Tokenize a URL.
 
 tokenizeTableCellAttributes (string $text, bool $sol)
 Tokenize table cell attributes.
 
 getLastErrorLogMessage ()
 If a tokenize method returned false, this will return a string describing the error, suitable for use in a log entry.
 
 resetState (array $opts)
 Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.
Parameters
array$options

 
- Public Member Functions inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage
 __construct (Env $env, ?PipelineStage $prevStage=null)
 
 setPipelineId (int $id)
 
 getPipelineId ()
 
 getEnv ()
 
 addTransformer (TokenHandler $t)
 Register a token transformer.
 
 setFrame (Frame $frame)
 Set frame on this pipeline stage.
 

Additional Inherited Members

- Protected Attributes inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage
 $prevStage
 
 $pipelineId = -1
 
 $env = null
 
bool $atTopLevel = false
 Defaults to false and resetState initializes it.
 
bool $toFragment = true
 
 $frame
 

Member Function Documentation

◆ getLastErrorLogMessage()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::getLastErrorLogMessage ( )

If a tokenize method returned false, this will return a string describing the error, suitable for use in a log entry.

If there has not been any error, returns false.

Returns
string|false

◆ process()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::process ( $input,
array $opts )

See PipelineStage::process docs as well.

This doc block refines the generic arg types to be specific to this pipeline stage.

Parameters
string$inputwikitext to tokenize
array{sol:bool}$opts
  • atTopLevel: (bool) Whether we are processing the top-level document
  • sol: (bool) Whether input should be processed in start-of-line context
Returns
array|false The token array, or false for a syntax error

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ processChunkily()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::processChunkily ( $text,
array $opts )

The text is tokenized in chunks (one per top-level block) and registered event listeners are called with the chunk to let it get processed further.

The main worker. Sets up event emission ('chunk' and 'end' events). Consumers are supposed to register with PegTokenizer before calling process().

Parameters
string$text
array{sol:bool}$opts
  • sol (bool) Whether text should be processed in start-of-line context.
Returns
Generator

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ resetState()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::resetState ( array $opts)

Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.

Parameters
array$options

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ setSourceOffsets()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::setSourceOffsets ( SourceRange $so)

Set start and end offsets of the source that generated this DOM.

Parameters
SourceRange$so

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ tokenizeAs()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeAs ( string $text,
string $rule,
bool $sol )

Tokenizes a string as a rule.

Parameters
string$textThe input text
string$ruleThe rule name
bool$solStart of line flag
Returns
array|false Array of tokens/strings or false on error

◆ tokenizeSync()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeSync ( string $text,
array $args )

Tokenize via a rule passed in as an arg.

The text is tokenized synchronously in one shot.

Parameters
string$text
array{sol:bool}$args
  • sol: (bool) Whether input should be processed in start-of-line context.
  • startRule: (string) which tokenizer rule to tokenize with
Returns
array|false The token array, or false for a syntax error

◆ tokenizeTableCellAttributes()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeTableCellAttributes ( string $text,
bool $sol )

Tokenize table cell attributes.

Parameters
string$text
bool$sol
Returns
array|false Array of tokens/strings or false on error

◆ tokenizeURL()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeURL ( string $text)

Tokenize a URL.

Parameters
string$text
Returns
array|false Array of tokens/strings or false on error

The documentation for this class was generated from the following file: