Parsoid
A bidirectional parser between wikitext and HTML5
|
HTML -> Wikitext serialization relies on walking the DOM and delegating the serialization requests to different DOM nodes. More...
Public Member Functions | |
__construct (bool $forceSOL=false) | |
handle (Element $node, SerializerState $state, bool $wrapperUnmodified=false) | |
Serialize a DOM node to wikitext. | |
before (Element $node, Node $otherNode, SerializerState $state) | |
How many newlines should be emitted before this node? | |
after (Element $node, Node $otherNode, SerializerState $state) | |
How many newlines should be emitted after this node? | |
firstChild (Node $node, Node $otherNode, SerializerState $state) | |
How many newlines should be emitted before the first child? | |
lastChild (Node $node, Node $otherNode, SerializerState $state) | |
How many newlines should be emitted after the last child? | |
forceSOL () | |
Put the serializer in start-of-line mode before it is handled. | |
Protected Member Functions | |
getListBullets (SerializerState $state, Element $node) | |
List helper: DOM-based list bullet construction. | |
maxNLsInTable (Node $node, Node $origNode) | |
Helper: Newline constraint helper for table nodes. | |
serializeTableTag (string $symbol, ?string $endSymbol, SerializerState $state, Element $node, bool $wrapperUnmodified) | |
Helper: Handles content serialization for table nodes. | |
stxInfoValidForTableCell (SerializerState $state, Element $node) | |
Helper: Checks whether syntax information in data-parsoid is valid in the presence of table edits. | |
getLeadingSpace (SerializerState $state, Element $node, string $newEltDefault) | |
Helper for several DOM handlers: Returns whitespace that needs to be emitted between the markup for the node and its content (ex: table cells, list items) based on node state (whether the node is original or new content) and other state (HTML version, whether selective serialization is enabled or not). | |
getTrailingSpace (SerializerState $state, Element $node, string $newEltDefault) | |
Helper for several DOM handlers: Returns whitespace that needs to be emitted between the markup for the node and its next sibling based on node state (whether the node is original or new content) and other state (HTML version, whether selective serialization is enabled or not). | |
emitPlaceholderSrc (Element $node, SerializerState $state) | |
Uneditable forms wrapped with mw:Placeholder tags OR unedited nowikis N.B. | |
HTML -> Wikitext serialization relies on walking the DOM and delegating the serialization requests to different DOM nodes.
This class represents the interface that various DOM handlers are expected to implement.
There is the core 'handle' method that deals with converting the content of the node into wikitext markup.
Then there are 4 newline-constraint methods that specify the constraints that need to be satisfied for the markup to be valid. For example, list items should always start on a newline, but can only have a single newline separator. Paragraphs always start on a newline and need at least 2 newlines in wikitext for them to be recognized as paragraphs.
Each of the 4 newline-constraint methods (before, after, firstChild, lastChild) return an array with a 'min' and 'max' property. If a property is missing, it means that the dom node doesn't have any newline constraints. Some DOM handlers might therefore choose to implement none, some, or all of these methods.
The return values of each of these methods are treated as consraints and the caller will have to resolve potentially conflicting constraints between a pair of nodes (siblings, parent-child). For example, if an after handler of a node wants 1 newline, but the before handler of its sibling wants none.
Ideally, there should not be any incompatible constraints, but we haven't actually verified that this is the case. All consraint-hanlding code is in the separators-handling methods.
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::after | ( | Element | $node, |
Node | $otherNode, | ||
SerializerState | $state ) |
How many newlines should be emitted after this node?
Element | $node | |
Node | $otherNode | |
SerializerState | $state |
Reimplemented in Wikimedia\Parsoid\Html2Wt\DOMHandlers\BRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\CaptionHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DTHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\EncapsulatedContentHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\FigureHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HeadingHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LIHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LinkHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\ListHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\MetaHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TableHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\THHandler, and Wikimedia\Parsoid\Html2Wt\DOMHandlers\TRHandler.
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::before | ( | Element | $node, |
Node | $otherNode, | ||
SerializerState | $state ) |
How many newlines should be emitted before this node?
Element | $node | |
Node | $otherNode | |
SerializerState | $state |
Reimplemented in Wikimedia\Parsoid\Html2Wt\DOMHandlers\BRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\CaptionHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DTHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\EncapsulatedContentHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\FigureHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HeadingHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LIHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LinkHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\ListHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\MetaHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TableHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\THHandler, and Wikimedia\Parsoid\Html2Wt\DOMHandlers\TRHandler.
|
protected |
Uneditable forms wrapped with mw:Placeholder tags OR unedited nowikis N.B.
We no longer emit self-closed nowikis as placeholders, so remove this once all our stored content is updated.
Element | $node | |
SerializerState | $state |
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::firstChild | ( | Node | $node, |
Node | $otherNode, | ||
SerializerState | $state ) |
How many newlines should be emitted before the first child?
Element | DocumentFragment | $node | |
Node | $otherNode | |
SerializerState | $state |
Reimplemented in Wikimedia\Parsoid\Html2Wt\DOMHandlers\BodyHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DTHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HTMLPreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LIHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PreHandler, and Wikimedia\Parsoid\Html2Wt\DOMHandlers\TableHandler.
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::forceSOL | ( | ) |
Put the serializer in start-of-line mode before it is handled.
All non-newline whitespace found between HTML nodes is stripped to ensure SOL state is guaranteed.
|
protected |
Helper for several DOM handlers: Returns whitespace that needs to be emitted between the markup for the node and its content (ex: table cells, list items) based on node state (whether the node is original or new content) and other state (HTML version, whether selective serialization is enabled or not).
SerializerState | $state | |
Element | $node | |
string | $newEltDefault |
|
protected |
List helper: DOM-based list bullet construction.
SerializerState | $state | |
Element | $node |
|
protected |
Helper for several DOM handlers: Returns whitespace that needs to be emitted between the markup for the node and its next sibling based on node state (whether the node is original or new content) and other state (HTML version, whether selective serialization is enabled or not).
SerializerState | $state | |
Element | $node | |
string | $newEltDefault |
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::handle | ( | Element | $node, |
SerializerState | $state, | ||
bool | $wrapperUnmodified = false ) |
Serialize a DOM node to wikitext.
Serialized wikitext should be returned via $state::emitChunk().
Element | $node | |
SerializerState | $state | |
bool | $wrapperUnmodified |
Reimplemented in Wikimedia\Parsoid\Html2Wt\DOMHandlers\AHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\BodyHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\BRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\CaptionHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\DTHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\EncapsulatedContentHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\FallbackHTMLHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\FigureHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HeadingHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HRHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HTMLPreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\ImgHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\JustChildrenHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LIHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\LinkHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\ListHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\MediaHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\MetaHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\QuoteHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\SpanHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TableHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\TDHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\THHandler, and Wikimedia\Parsoid\Html2Wt\DOMHandlers\TRHandler.
Wikimedia\Parsoid\Html2Wt\DOMHandlers\DOMHandler::lastChild | ( | Node | $node, |
Node | $otherNode, | ||
SerializerState | $state ) |
How many newlines should be emitted after the last child?
Element | DocumentFragment | $node | |
Node | $otherNode | |
SerializerState | $state |
Reimplemented in Wikimedia\Parsoid\Html2Wt\DOMHandlers\BodyHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\HTMLPreHandler, Wikimedia\Parsoid\Html2Wt\DOMHandlers\PreHandler, and Wikimedia\Parsoid\Html2Wt\DOMHandlers\TableHandler.
|
protected |
Helper: Newline constraint helper for table nodes.
Node | $node | |
Node | $origNode |
|
protected |
Helper: Handles content serialization for table nodes.
string | $symbol | |
?string | $endSymbol | |
SerializerState | $state | |
Element | $node | |
bool | $wrapperUnmodified |
|
protected |
Helper: Checks whether syntax information in data-parsoid is valid in the presence of table edits.
For example "|" is no longer valid table-cell markup if a table cell is added before this cell.
SerializerState | $state | |
Element | $node |