Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Html2Wt\WikitextSerializer Class Reference

Wikitext to HTML serializer. More...

Public Member Functions

 __construct (Env $env, $options)
 
 linkHandler (Element $node)
 Main link handler.
 
 languageVariantHandler (Node $node)
 
 escapeWikitext (SerializerState $state, string $text, array $opts)
 Escape wikitext-like strings in '$text' so that $text renders as a plain string when rendered as HTML.
 
 domToWikitext (array $opts, DocumentFragment $node)
 
 htmlToWikitext (array $opts, string $html)
 
 getAttributeKey (Element $node, string $key)
 
 getAttributeValue (Element $node, string $key)
 
 getAttributeValueAsShadowInfo (Element $node, string $key)
 
 serializedImageAttrVal (Element $dataMWnode, Element $htmlAttrNode, string $key)
 
 serializedAttrVal (Element $node, string $name)
 
 tagNeedsEscaping (string $name)
 Check if token needs escaping.
 
 wrapAngleBracket (Token $token, string $inner)
 
 serializeHTMLTag (Element $node, bool $wrapperUnmodified)
 
 serializeHTMLEndTag (Element $node, $wrapperUnmodified)
 
 serializeAttributes (Element $node, Token $token, bool $isWt=false)
 
 handleLIHackIfApplicable (Element $node)
 FIXME: Get rid of this function after content version 2.2.0 has expired from caches.
 
 serializeFromParts (SerializerState $state, Element $node, array $srcParts)
 Serialize a template from its parts.
 
 serializeExtensionStartTag (Element $node, SerializerState $state)
 
 defaultExtensionHandler (Element $node, SerializerState $state)
 
 emitWikitext (string $res, Node $node)
 Emit non-separator wikitext that does not need to be escaped.
 
 serializeDOM (Node $node, bool $selserMode=false)
 Serialize an HTML DOM.
 
 trace (... $args)
 

Public Attributes

 $wteHandlers
 
 $env
 

Detailed Description

Wikitext to HTML serializer.

Serializes a chunk of tokens or an HTML DOM to MediaWiki's wikitext flavor.

This serializer is designed to eventually

  • accept arbitrary HTML and
  • serialize that to wikitext in a way that round-trips back to the same HTML DOM as far as possible within the limitations of wikitext.

Not much effort has been invested so far on supporting non-Parsoid/VE-generated HTML. Some of this involves adaptively switching between wikitext and HTML representations based on the values of attributes and DOM context. A few special cases are already handled adaptively (multi-paragraph list item contents are serialized as HTML tags for example, generic A elements are serialized to HTML A tags), but in general support for this is mostly missing.

Example issue:

<h1><p>foo</p></h1> will serialize to =\nfoo\n= whereas the
correct serialized output would be: =<p>foo</p>=

What to do about this?

  • add a generic 'can this HTML node be serialized to wikitext in this context' detection method and use that to adaptively switch between wikitext and HTML serialization.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::__construct ( Env $env,
$options )
Parameters
Env$env
array$optionsList of options for serialization:
  • logType: (string)
  • extName: (string)

Member Function Documentation

◆ defaultExtensionHandler()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::defaultExtensionHandler ( Element $node,
SerializerState $state )
Parameters
Element$node
SerializerState$state
Returns
string

◆ domToWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::domToWikitext ( array $opts,
DocumentFragment $node )
Parameters
array$opts
DocumentFragment$node
Returns
string

◆ emitWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::emitWikitext ( string $res,
Node $node )

Emit non-separator wikitext that does not need to be escaped.

Parameters
string$res
Node$node

◆ escapeWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::escapeWikitext ( SerializerState $state,
string $text,
array $opts )

Escape wikitext-like strings in '$text' so that $text renders as a plain string when rendered as HTML.

The escaping is done based on the context in which $text is present (ex: start-of-line, in a link, etc.)

Parameters
SerializerState$state
string$text
array$opts
  • node: (Node)
  • isLastChild: (bool)
Returns
string

◆ getAttributeKey()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::getAttributeKey ( Element $node,
string $key )
Parameters
Element$node
string$key
Returns
string

◆ getAttributeValue()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::getAttributeValue ( Element $node,
string $key )
Parameters
Element$node
string$keyAttribute name.
Returns
?string The wikitext value, or null if the attribute is not present.

◆ getAttributeValueAsShadowInfo()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::getAttributeValueAsShadowInfo ( Element $node,
string $key )
Parameters
Element$node
string$key
Returns
array|null A tuple in WTSUtils::getShadowInfo() format, with an extra 'fromDataMW' flag.

◆ handleLIHackIfApplicable()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::handleLIHackIfApplicable ( Element $node)

FIXME: Get rid of this function after content version 2.2.0 has expired from caches.

Parameters
Element$node

◆ htmlToWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::htmlToWikitext ( array $opts,
string $html )
Parameters
array$opts
string$html
Returns
string

◆ languageVariantHandler()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::languageVariantHandler ( Node $node)
Parameters
Element$node
Returns
void

◆ linkHandler()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::linkHandler ( Element $node)

Main link handler.

Parameters
Element$nodeUsed in multiple tag handlers ( and <link>), and hence added as top-level method

◆ serializeAttributes()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeAttributes ( Element $node,
Token $token,
bool $isWt = false )
Parameters
Element$node
Token$token
bool$isWt
Returns
string

◆ serializedAttrVal()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializedAttrVal ( Element $node,
string $name )
Parameters
Element$node
string$name
Returns
array

◆ serializedImageAttrVal()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializedImageAttrVal ( Element $dataMWnode,
Element $htmlAttrNode,
string $key )
Parameters
Element$dataMWnode
Element$htmlAttrNode
string$key
Returns
array A tuple in WTSUtils::getShadowInfo() format, possibly with an extra 'fromDataMW' flag.

◆ serializeDOM()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeDOM ( Node $node,
bool $selserMode = false )

Serialize an HTML DOM.

WARNING: You probably want to use WikitextContentModelHandler::fromDOM instead.

Parameters
Document | DocumentFragment$node
bool$selserMode
Returns
string

◆ serializeExtensionStartTag()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeExtensionStartTag ( Element $node,
SerializerState $state )
Parameters
Element$node
SerializerState$state
Returns
string

◆ serializeFromParts()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeFromParts ( SerializerState $state,
Element $node,
array $srcParts )

Serialize a template from its parts.

Parameters
SerializerState$state
Element$node
stdClass[]$srcPartsTemplate parts from TemplateInfo::getDataMw()
Returns
string

◆ serializeHTMLEndTag()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeHTMLEndTag ( Element $node,
$wrapperUnmodified )
Parameters
Element$node
bool$wrapperUnmodified
Returns
string

◆ serializeHTMLTag()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeHTMLTag ( Element $node,
bool $wrapperUnmodified )
Parameters
Element$node
bool$wrapperUnmodified
Returns
string

◆ tagNeedsEscaping()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::tagNeedsEscaping ( string $name)

Check if token needs escaping.

Parameters
string$name
Returns
bool

◆ trace()

string Log type for Wikimedia\Parsoid\Html2Wt\WikitextSerializer::trace ( $args)
Note
Porting note: this replaces the pattern $serializer->env->log( $serializer->logType, ... )
Parameters
mixed...$args
Deprecated
Use PSR-3 logging instead

◆ wrapAngleBracket()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::wrapAngleBracket ( Token $token,
string $inner )
Parameters
Token$token
string$inner
Returns
string

The documentation for this class was generated from the following file: