|
Parsoid
A bidirectional parser between wikitext and HTML5
|
Tokenizer for wikitext, using WikiPEG and a separate PEG grammar file (Grammar.pegphp) More...
Inheritance diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:
Collaboration diagram for Wikimedia\Parsoid\Wt2Html\PegTokenizer:Public Member Functions | |
| __construct (Env $env, array $options=[], string $stageId="") | |
| getOptions () | |
| Get the constructor options. | |
| getFrame () | |
| process (string|array|DocumentFragment|Element $input, array $options) | |
| See PipelineStage::process docs as well. | |
| processChunkily (string|array|DocumentFragment|Element $input, array $options) | |
| The text is tokenized in chunks (one per top-level block). | |
| finalize () | |
| Finalize stage.This lets us not worry about tracking EOFTk but still ensures that we always exit the pipeline no matter the error. | |
| tokenizeSync (string $text, array $args, &$exception=null) | |
| Tokenize via a rule passed in as an arg. | |
| tokenizeAs (string|Source $text, string $rule, bool $sol) | |
| Tokenizes a string as a rule. | |
| tokenizeURL (string $text) | |
| Tokenize a URL. | |
| tokenizeTableCellAttributes (string $text, bool $sol) | |
| Tokenize table cell attributes. | |
| resetState (array $options) | |
| Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused. | |
Public Member Functions inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage | |
| __construct (Env $env) | |
| setPipelineId (int $id) | |
| getPipelineId () | |
| getEnv () | |
| addTransformer (TokenHandler $t) | |
| Register a token transformer. | |
| setFrame (Frame $frame) | |
| Set frame on this pipeline stage. | |
| setSrcOffsets (SourceRange $srcOffsets) | |
| Set the source offsets for the content being processed by this pipeline. | |
| getSrcOffsets () | |
Additional Inherited Members | |
Protected Attributes inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage | |
| int | $pipelineId = -1 |
| This is primarily a debugging aid. | |
| Env | $env = null |
| bool | $atTopLevel = false |
| Defaults to false and resetState initializes it. | |
| bool | $toFragment = true |
| Frame | $frame = null |
| Both these default to null and are set by helper methods. | |
| SourceRange | $srcOffsets = null |
Tokenizer for wikitext, using WikiPEG and a separate PEG grammar file (Grammar.pegphp)
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::finalize | ( | ) |
Finalize stage.This lets us not worry about tracking EOFTk but still ensures that we always exit the pipeline no matter the error.
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::process | ( | string|array|DocumentFragment|Element | $input, |
| array | $options ) |
See PipelineStage::process docs as well.
This doc block refines the generic arg types to be specific to this pipeline stage.
| string | array | DocumentFragment | Element | $input | Wikitext to tokenize. In practice this should be a string. |
| array{sol:bool} | $options
|
| SyntaxError |
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::processChunkily | ( | string|array|DocumentFragment|Element | $input, |
| array | $options ) |
The text is tokenized in chunks (one per top-level block).
| string | array | DocumentFragment | Element | $input | Wikitext to tokenize. In practice this should be a string. |
| array{atTopLevel:bool,sol:bool} | $options
|
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::resetState | ( | array | $options | ) |
Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeAs | ( | string|Source | $text, |
| string | $rule, | ||
| bool | $sol ) |
Tokenizes a string as a rule.
| string | Source | $text | The input text |
| string | $rule | The rule name |
| bool | $sol | Start of line flag |
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeSync | ( | string | $text, |
| array | $args, | ||
| & | $exception = null ) |
Tokenize via a rule passed in as an arg.
The text is tokenized synchronously in one shot.
| string | $text | |
| array{sol:bool} | $args
| |
| SyntaxError | null | &$exception | a syntax error, if thrown. |
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeTableCellAttributes | ( | string | $text, |
| bool | $sol ) |
Tokenize table cell attributes.
| string | $text | |
| bool | $sol |
| Wikimedia\Parsoid\Wt2Html\PegTokenizer::tokenizeURL | ( | string | $text | ) |
Tokenize a URL.
| string | $text |