Parsoid
A bidirectional parser between wikitext and HTML5
|
Perform post-processing steps on an already-built HTML DOM. More...
Public Member Functions | |||||||
__construct (Env $env, array $options=[], string $stageId="", ?PipelineStage $prevStage=null) | |||||||
registerProcessors (?array $processors) | |||||||
getDefaultProcessors () | |||||||
setSourceOffsets (SourceRange $so) | |||||||
Set the source offsets for the content being processing by this pipeline This matters for when a substring of the top-level page is being processed in its own pipeline.This ensures that all source offsets assigned to tokens and DOM nodes in this stage are relative to the top-level page.
| |||||||
resetState (array $options) | |||||||
Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.
| |||||||
addMetaData (Env $env, Document $document) | |||||||
FIXME: consider moving to DOMUtils or Env. | |||||||
doPostProcess (Node $node) | |||||||
process ( $node, array $opts=null) | |||||||
processChunkily ( $input, ?array $options) | |||||||
Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.This method will either directly or indirectly implement a generator that parses the input in chunks and yields output in chunks as well.Implementations that don't consume tokens (ex: Tokenizer, DOMPostProcessor) will provide specialized implementations that handle their input type.
| |||||||
Public Member Functions inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage | |||||||
__construct (Env $env, ?PipelineStage $prevStage=null) | |||||||
setPipelineId (int $id) | |||||||
getPipelineId () | |||||||
getEnv () | |||||||
addTransformer (TokenHandler $t) | |||||||
Register a token transformer. | |||||||
setFrame (Frame $frame) | |||||||
Set frame on this pipeline stage. | |||||||
process ( $input, ?array $options=null) | |||||||
Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is. | |||||||
Additional Inherited Members | |
Protected Attributes inherited from Wikimedia\Parsoid\Wt2Html\PipelineStage | |
$prevStage | |
$pipelineId = -1 | |
$env = null | |
$atTopLevel | |
$frame | |
Perform post-processing steps on an already-built HTML DOM.
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::__construct | ( | Env | $env, |
array | $options = [], | ||
string | $stageId = "", | ||
?PipelineStage | $prevStage = null ) |
Env | $env | |
array | $options | |
string | $stageId | |
?PipelineStage | $prevStage |
FIXME: consider moving to DOMUtils or Env.
Env | $env | |
Document | $document |
FIXME: The JS side has a bunch of other checks here
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::doPostProcess | ( | Node | $node | ) |
Node | $node |
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::getDefaultProcessors | ( | ) |
FIXME: There are two potential ordering problems here.
This ordering issue is handled through documentation.
The ideal solution to this problem is to require that every extension's extensionPostProcessor be idempotent which lets us run these post processors repeatedly till the DOM stabilizes. But, this still doesn't necessarily guarantee that ordering doesn't matter. It just guarantees that with the unpackOutput flag set to false multiple extensions, all sealed fragments get fully processed. So, we still need to worry about that problem.
But, idempotence could potentially be a sufficient property in most cases. To see this, consider that there is a Footnotes extension which is similar to the Cite extension in that they both extract inline content in the page source to a separate section of output and leave behind pointers to the global section in the output DOM. Given this, the Cite and Footnote extension post processors would essentially walk the dom and move any existing inline content into that global section till it is done. So, even if a <footnote> has a <ref> and a <ref> has a <footnote>, we ultimately end up with all footnote content in the footnotes section and all ref content in the references section and the DOM stabilizes. Ordering is irrelevant here.
So, perhaps one way of catching these problems would be in code review by analyzing what the DOM postprocessor does and see if it introduces potential ordering issues.
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::processChunkily | ( | $input, | |
?array | $options ) |
Process wikitext, an array of tokens, or a DOM document depending on what pipeline stage this is.This method will either directly or indirectly implement a generator that parses the input in chunks and yields output in chunks as well.Implementations that don't consume tokens (ex: Tokenizer, DOMPostProcessor) will provide specialized implementations that handle their input type.
string | array | Document | $input | |
?array | $options |
|
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::registerProcessors | ( | ?array | $processors | ) |
?array | $processors |
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::resetState | ( | array | $options | ) |
Resets any internal state for this pipeline stage.This is usually called so a cached pipeline can be reused.
array | $options |
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.
Wikimedia\Parsoid\Wt2Html\DOMPostProcessor::setSourceOffsets | ( | SourceRange | $so | ) |
Set the source offsets for the content being processing by this pipeline This matters for when a substring of the top-level page is being processed in its own pipeline.This ensures that all source offsets assigned to tokens and DOM nodes in this stage are relative to the top-level page.
SourceRange | $so |
Reimplemented from Wikimedia\Parsoid\Wt2Html\PipelineStage.