Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Wt2Html\PipelineStage Class Reference

This represents the abstract interface for a wt2html parsing pipeline stage Currently there are four known pipeline stages: More...

+ Inheritance diagram for Wikimedia\Parsoid\Wt2Html\PipelineStage:
+ Collaboration diagram for Wikimedia\Parsoid\Wt2Html\PipelineStage:

Public Member Functions

 __construct (Env $env)
 
 setPipelineId (int $id)
 
 getPipelineId ()
 
 getEnv ()
 
 addTransformer (TokenHandler $t)
 Register a token transformer.
 
 resetState (array $options)
 Resets any internal state for this pipeline stage.
 
 setFrame (Frame $frame)
 Set frame on this pipeline stage.
 
 setSrcOffsets (SourceRange $srcOffsets)
 Set the source offsets for the content being processed by this pipeline.
 
 getSrcOffsets ()
 
 process (string|array|DocumentFragment|Element $input, array $options)
 Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.
 
 processChunkily (string|array|DocumentFragment|Element $input, array $options)
 Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.
 
 finalize ()
 Finalize stage.
 

Protected Attributes

int $pipelineId = -1
 This is primarily a debugging aid.
 
Env $env = null
 
bool $atTopLevel = false
 Defaults to false and resetState initializes it.
 
bool $toFragment = true
 
Frame $frame = null
 Both these default to null and are set by helper methods.
 
SourceRange $srcOffsets = null
 

Detailed Description

This represents the abstract interface for a wt2html parsing pipeline stage Currently there are four known pipeline stages:

  • PEGTokenizer
  • TokenHandlerPipeline
  • TreeBuilder/TreeBuilderStage ( Remex-based HTML5 Tree Builder )
  • DOMProcessorPipeline

Member Function Documentation

◆ addTransformer()

Wikimedia\Parsoid\Wt2Html\PipelineStage::addTransformer ( TokenHandler $t)

Register a token transformer.

Reimplemented in Wikimedia\Parsoid\Wt2Html\TokenHandlerPipeline.

◆ finalize()

Wikimedia\Parsoid\Wt2Html\PipelineStage::finalize ( )
abstract

Finalize stage.

This lets us not worry about tracking EOFTk but still ensures that we always exit the pipeline no matter the error.

Reimplemented in Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline, Wikimedia\Parsoid\Wt2Html\PegTokenizer, Wikimedia\Parsoid\Wt2Html\TokenHandlerPipeline, and Wikimedia\Parsoid\Wt2Html\TreeBuilder\TreeBuilderStage.

◆ process()

Wikimedia\Parsoid\Wt2Html\PipelineStage::process ( string|array|DocumentFragment|Element $input,
array $options )
abstract

Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.

This will be entirety of the input that will be processed by this pipeline stage and no further input or an EOF signal will follow.

Parameters
string | array | DocumentFragment | Element$input
array{atTopLevel:bool,sol:bool}$options
  • atTopLevel: (bool) Whether we are processing the top-level document
  • sol: (bool) Whether input should be processed in start-of-line context
  • chunky (bool) Whether we are processing the input chunkily.
Returns
list<Token|string>|DocumentFragment|Element

Reimplemented in Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline, Wikimedia\Parsoid\Wt2Html\PegTokenizer, and Wikimedia\Parsoid\Wt2Html\TreeBuilder\TreeBuilderStage.

◆ processChunkily()

Wikimedia\Parsoid\Wt2Html\PipelineStage::processChunkily ( string|array|DocumentFragment|Element $input,
array $options )
abstract

Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.

This method will either directly or indirectly implement a generator that parses the input in chunks and yields output in chunks as well.

Implementations that don't consume tokens (ex: Tokenizer, DOMProcessorPipeline) will provide specialized implementations that handle their input type.

Parameters
string | array | DocumentFragment | Element$input
array{atTopLevel:bool,sol:bool}$options
  • atTopLevel: (bool) Whether we are processing the top-level document
  • sol: (bool) Whether input should be processed in start-of-line context
Returns
Generator<list<Token|string>|DocumentFragment|Element>

Reimplemented in Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline, Wikimedia\Parsoid\Wt2Html\PegTokenizer, and Wikimedia\Parsoid\Wt2Html\TreeBuilder\TreeBuilderStage.

◆ resetState()

Wikimedia\Parsoid\Wt2Html\PipelineStage::resetState ( array $options)

Resets any internal state for this pipeline stage.

This is usually called so a cached pipeline can be reused.

Reimplemented in Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline, Wikimedia\Parsoid\Wt2Html\PegTokenizer, Wikimedia\Parsoid\Wt2Html\TokenHandlerPipeline, and Wikimedia\Parsoid\Wt2Html\TreeBuilder\TreeBuilderStage.

◆ setSrcOffsets()

Wikimedia\Parsoid\Wt2Html\PipelineStage::setSrcOffsets ( SourceRange $srcOffsets)

Set the source offsets for the content being processed by this pipeline.

This matters for when a substring of the top-level page is being processed in its own pipeline. This ensures that all source offsets assigned to tokens and DOM nodes in this stage are relative to the top-level page.

Reimplemented in Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline.


The documentation for this class was generated from the following file: