Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Wt2Html\TT\PreHandler Class Reference

PRE-handling relies on the following 6-state FSM. More...

+ Inheritance diagram for Wikimedia\Parsoid\Wt2Html\TT\PreHandler:
+ Collaboration diagram for Wikimedia\Parsoid\Wt2Html\TT\PreHandler:

Public Member Functions

 __construct (TokenTransformManager $manager, array $options)
 
 resetState (array $opts)
 Resets any internal state for this token handler.
 
 onNewline (NlTk $token)
 This handler is called for newline tokens only.
Parameters
NlTk$tokenNewline token to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

 
 onEnd (EOFTk $token)
 This handler is called for EOF tokens only.
Parameters
EOFTk$tokenEOF token to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

 
 onAny ( $token)
 This handler is called for all tokens in the token stream except if (a) The more specific handlers above modified the token (b) the more specific handlers (onTag, onEnd, onNewline) have set the skip flag in their return values.(c) this handlers 'active' flag is set to false (can be set by any of the handlers).
Parameters
Token | string$tokenToken to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

 
- Public Member Functions inherited from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler
 setPipelineId (int $id)
 
 isDisabled ()
 Is this transformer disabled?
 
 onTag (Token $token)
 This handler is called for tokens that are not EOFTk or NLTk tokens.
 
 process ( $tokens)
 Push an input array of tokens through the transformer and return the transformed tokens.
 

Static Public Member Functions

static newIndentPreWS ()
 Create a token to represent the indent-pre whitespace character.
 
static isIndentPreWS ( $tokenOrNode)
 Does this token or node represent an indent-pre whitespace character?
 

Additional Inherited Members

- Protected Attributes inherited from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler
 $env
 
 $manager
 
 $pipelineId
 
 $options
 
bool $disabled = false
 This is set if the token handler is disabled for the entire pipeline.
 
bool $onAnyEnabled = true
 This is set/reset by the token handlers at various points in the token stream based on what is encountered.
 
 $atTopLevel = false
 

Detailed Description

PRE-handling relies on the following 6-state FSM.

States

SOL -- start-of-line
(white-space, comments, meta-tags are all SOL transparent)
The FSM always starts in this state.
PRE -- we might need a pre-block
(if we enter the PRE_COLLECT state)
PRE_COLLECT -- we will need to generate a pre-block and are collecting
content for it.
SOL_AFTER_PRE -- we might need to extend the pre-block to multiple lines.
(depending on whether we see a white-space tok or not)
MULTILINE_PRE -- We will wrap one or more previous lines with <pre>
This line could be part of that pre if we enter PRE_COLLECT state
IGNORE -- nothing to do for the rest of the line.

Action helpers

genPre : return merge("<pre>$TOKS</pre>" while skipping sol-tr toks, sol-tr toks) processCurrLine : $TOKS += $PRE_TOKS; $PRE_TOKS = []; purgeBuffers : convert meta token to ' '; processCurrLine; RET = $TOKS; $TOKS = []; return RET discardCurrLinePre : return merge(genPre, purgeBuffers)

Transitions

+ --------------+-----------------+---------------+-------------------------+
| Start state | Token | End state | Action |
+ --------------+-----------------+---------------+-------------------------+
| SOL | --- nl --> | SOL | purgeBuffers |
| SOL | --- eof --> | --- | purgeBuffers |
| SOL | --- sol-tr --> | SOL | TOKS << tok |
| SOL | --- ws --> | PRE | PRE_TOKS = [ wsTok(#) ] |
| SOL | --- other --> | IGNORE | purgeBuffers |
+ --------------+-----------------+---------------+-------------------------+
| PRE | --- nl --> | SOL | purgeBuffers |
| PRE | --- eof --> | --- | purgeBuffers |
| PRE | --- sol-tr --> | PRE | PRE_TOKS << tok |
| PRE | --- blk tag --> | IGNORE | purgeBuffers |
| PRE | --- other --> | PRE_COLLECT | PRE_TOKS << tok |
+ --------------+-----------------+---------------+-------------------------+
| PRE_COLLECT | --- nl --> | SOL_AFTER_PRE | processCurrLine |
| PRE_COLLECT | --- eof --> | --- | processCurrLine; genPre |
| PRE_COLLECT | --- blk tag --> | IGNORE | discardCurrLinePre |
| PRE_COLLECT | --- other --> | PRE_COLLECT | PRE_TOKS << tok |
+ --------------+-----------------+---------------+-------------------------+
| SOL_AFTER_PRE | --- nl --> | SOL | discardCurrLinePre |
| SOL_AFTER_PRE | --- eof --> | --- | discardCurrLinePre |
| SOL_AFTER_PRE | --- sol-tr --> | SOL_AFTER_PRE | PRE_TOKS << tok |
| SOL_AFTER_PRE | --- ws --> | MULTILINE_PRE | PRE_TOKS << wsTok(#) |
| SOL_AFTER_PRE | --- other --> | IGNORE | discardCurrLinePre |
+ --------------+-----------------+---------------+-------------------------+
| MULTILINE_PRE | --- nl --> | SOL_AFTER_PRE | processCurrLine |
| MULTILINE_PRE | --- eof --> | --- | discardCurrLinePre |
| MULTILINE_PRE | --- sol-tr --> | SOL_AFTER_PRE | PRE_TOKS << tok |
| MULTILINE_PRE | --- blk tag --> | IGNORE | discardCurrLinePre |
| MULTILINE_PRE | --- other --> | PRE_COLLECT | PRE_TOKS << tok |
+ --------------+-----------------+---------------+-------------------------+
| IGNORE | --- eof --> | --- | purgeBuffers |
| IGNORE | --- nl --> | SOL | purgeBuffers |
+ --------------+-----------------+---------------+-------------------------+
# In these states, we assume that the whitespace char is split off from the
the rest of the string.
Catch-all class for all token types.
Definition Token.php:16

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Wt2Html\TT\PreHandler::__construct ( TokenTransformManager $manager,
array $options )
Parameters
TokenTransformManager$managermanager enviroment
array$optionsvarious configuration options

Reimplemented from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler.

Member Function Documentation

◆ isIndentPreWS()

static Wikimedia\Parsoid\Wt2Html\TT\PreHandler::isIndentPreWS ( $tokenOrNode)
static

Does this token or node represent an indent-pre whitespace character?

Parameters
Token | Node | string$tokenOrNode
Returns
bool

◆ newIndentPreWS()

static Wikimedia\Parsoid\Wt2Html\TT\PreHandler::newIndentPreWS ( )
static

Create a token to represent the indent-pre whitespace character.

Notes about choice of token representation

This token will not make it to the final output and is only present to ensure DSR computation can account for this whitespace character. This meta tag will be removed in CleanUp::stripMarkerMetas().

Given that this token is purely an internal bookkeeping placeholder, it really does not matter how we represent it as long as (a) it doesn't impede code comprehension (b) it is more or less consistent with how other instances of this token behave (c) it doesn't introduce a lot of special-case handling and checks to deal with it.

Based on that consideration, we settle for a meta tag because meta tags are transparent to most token and DOM handlers.

Notes about DSR computation

Once we are done with all DOM processing, we expect indent-pre

 tags to have
DSR that looks like [ _, _, 1, 0 ], i.e. it has an opening tag width of 1 char and
closing tag width of 0 char. But, since we are now explicitly representing the ws char
as a meta-tag, we 
 tag will not get a 1-char width during DSR computation since
this meta-tag will consume that width. Accordingly, once we strip this meta-tag in the
cleanup pass, we will reassign its width to the opening tag width of the 
 tag.

Returns
Token

◆ onAny()

Wikimedia\Parsoid\Wt2Html\TT\PreHandler::onAny ( $token)

This handler is called for all tokens in the token stream except if (a) The more specific handlers above modified the token (b) the more specific handlers (onTag, onEnd, onNewline) have set the skip flag in their return values.(c) this handlers 'active' flag is set to false (can be set by any of the handlers).

Parameters
Token | string$tokenToken to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

Reimplemented from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler.

◆ onEnd()

Wikimedia\Parsoid\Wt2Html\TT\PreHandler::onEnd ( EOFTk $token)

This handler is called for EOF tokens only.

Parameters
EOFTk$tokenEOF token to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

Reimplemented from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler.

◆ onNewline()

Wikimedia\Parsoid\Wt2Html\TT\PreHandler::onNewline ( NlTk $token)

This handler is called for newline tokens only.

Parameters
NlTk$tokenNewline token to be processed
Returns
TokenHandlerResult|null A TokenHandlerResult, or null to efficiently indicate that the input token is unchanged.

Reimplemented from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler.

◆ resetState()

Wikimedia\Parsoid\Wt2Html\TT\PreHandler::resetState ( array $options)

Resets any internal state for this token handler.

Parameters
array$options

Reimplemented from Wikimedia\Parsoid\Wt2Html\TT\TokenHandler.


The documentation for this class was generated from the following file: