Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Wt2Html\PegTokenizer Class Reference

Tokenizer for wikitext, using WikiPEG and a separate PEG grammar file (Grammar.pegphp) More...

+ Inheritance diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:
+ Collaboration diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:

Public Member Functions

 __construct (Env $env, array $options=[], string $stageId="")
 
 getOptions ()
 Get the constructor options.
 
 getFrame ()
 
 process (string|array|DocumentFragment|Element $input, array $options)
 See PipelineStage::process docs as well.
 
 processChunkily (string|array|DocumentFragment|Element $input, array $options)
 The text is tokenized in chunks (one per top-level block).
 
 finalize ()
 Finalize stage.This lets us not worry about tracking EOFTk but still ensures that we always exit the pipeline no matter the error.
 
 tokenizeSync (string $text, array $args, &$exception=null)
 Tokenize via a rule passed in as an arg.
 
 tokenizeAs (string|Source $text, string $rule, bool $sol)
 Tokenizes a string as a rule.
 
 tokenizeURL (string $text)
 Tokenize a URL.
 
 tokenizeTableCellAttributes (string $text, bool $sol)
 Tokenize table cell attributes.
 
 resetState (array $options)
 Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.
 
- Public Member Functions inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage
 __construct (Env $env)
 
 setPipelineId (int $id)
 
 getPipelineId ()
 
 getEnv ()
 
 addTransformer (TokenHandler $t)
 Register a token transformer.
 
 setFrame (Frame $frame)
 Set frame on this pipeline stage.
 
 setSrcOffsets (SourceRange $srcOffsets)
 Set the source offsets for the content being processed by this pipeline.
 
 getSrcOffsets ()
 

Additional Inherited Members

- Protected Attributes inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage
int $pipelineId = -1
 This is primarily a debugging aid.
 
Env $env = null
 
bool $atTopLevel = false
 Defaults to false and resetState initializes it.
 
bool $toFragment = true
 
Frame $frame = null
 Both these default to null and are set by helper methods.
 
SourceRange $srcOffsets = null
 

Detailed Description

Tokenizer for wikitext, using WikiPEG and a separate PEG grammar file (Grammar.pegphp)

Member Function Documentation

◆ finalize()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::finalize ( )

Finalize stage.This lets us not worry about tracking EOFTk but still ensures that we always exit the pipeline no matter the error.

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ process()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::process ( string|array|DocumentFragment|Element $input,
array $options )

See PipelineStage::process docs as well.

This doc block refines the generic arg types to be specific to this pipeline stage.

Parameters
string | array | DocumentFragment | Element$inputWikitext to tokenize. In practice this should be a string.
array{sol:bool}$options
  • atTopLevel: (bool) Whether we are processing the top-level document
  • sol: (bool) Whether input should be processed in start-of-line context
Returns
array The token array
Exceptions
SyntaxError

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ processChunkily()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::processChunkily ( string|array|DocumentFragment|Element $input,
array $options )

The text is tokenized in chunks (one per top-level block).

Parameters
string | array | DocumentFragment | Element$inputWikitext to tokenize. In practice this should be a string.
array{atTopLevel:bool,sol:bool}$options
  • atTopLevel: (bool) Whether we are processing the top-level document
  • sol (bool) Whether text should be processed in start-of-line context.
Returns
Generator<list<Token|string>>

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ resetState()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::resetState ( array $options)

Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.

Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.

◆ tokenizeAs()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeAs ( string|Source $text,
string $rule,
bool $sol )

Tokenizes a string as a rule.

Parameters
string | Source$textThe input text
string$ruleThe rule name
bool$solStart of line flag
Returns
array|false Array of tokens/strings or false on error

◆ tokenizeSync()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeSync ( string $text,
array $args,
& $exception = null )

Tokenize via a rule passed in as an arg.

The text is tokenized synchronously in one shot.

Parameters
string$text
array{sol:bool}$args
  • sol: (bool) Whether input should be processed in start-of-line context.
  • startRule: (string) which tokenizer rule to tokenize with
SyntaxError | null&$exceptiona syntax error, if thrown.
Returns
array|false The token array, or false for a syntax error

◆ tokenizeTableCellAttributes()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeTableCellAttributes ( string $text,
bool $sol )

Tokenize table cell attributes.

Parameters
string$text
bool$sol
Returns
array|false Array of tokens/strings or false on error

◆ tokenizeURL()

Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeURL ( string $text)

Tokenize a URL.

Parameters
string$text
Returns
array|false Array of tokens/strings or false on error

The documentation for this class was generated from the following file: