Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Html2Wt\WikitextSerializer Class Reference

Wikitext to HTML serializer. More...

Public Member Functions

 __construct (Env $env, $options)
 
 linkHandler (Element $node)
 Main link handler.
 
 languageVariantHandler (Node $node)
 
 escapeWikitext (SerializerState $state, string $text, array $opts)
 Escape wikitext-like strings in '$text' so that $text renders as a plain string when rendered as HTML.
 
 domToWikitext (array $opts, DocumentFragment $node)
 
 htmlToWikitext (array $opts, string $html)
 
 getAttributeKey (Element $node, string $key)
 
 getAttributeValue (Element $node, string $key)
 
 getAttributeValueAsShadowInfo (Element $node, string $key)
 
 serializedImageAttrVal (Element $dataMWnode, Element $htmlAttrNode, string $key)
 
 serializedAttrVal (Element $node, string $name)
 
 tagNeedsEscaping (string $name)
 Check if token needs escaping.
 
 wrapAngleBracket (Token $token, string $inner)
 
 serializeHTMLTag (Element $node, bool $wrapperUnmodified)
 
 serializeHTMLEndTag (Element $node, $wrapperUnmodified)
 
 serializeAttributes (Element $node, Token $token, bool $isWt=false)
 
 handleLIHackIfApplicable (Element $node)
 FIXME: Get rid of this function after content version 2.2.0 has expired from caches.
 
 serializeFromParts (SerializerState $state, Element $node, array $srcParts)
 Serialize a template from its parts.
 
 serializeExtensionStartTag (Element $node, SerializerState $state)
 
 defaultExtensionHandler (Element $node, SerializerState $state)
 
 emitWikitext (string $res, Node $node)
 Emit non-separator wikitext that does not need to be escaped.
 
 serializeDOM (Node $node, bool $selserMode=false)
 Serialize an HTML DOM.
 
 trace (... $args)
 

Public Attributes

 $wteHandlers
 
 $env
 

Detailed Description

Wikitext to HTML serializer.

Serializes a chunk of tokens or an HTML DOM to MediaWiki's wikitext flavor.

This serializer is designed to eventually

  • accept arbitrary HTML and
  • serialize that to wikitext in a way that round-trips back to the same HTML DOM as far as possible within the limitations of wikitext.

Not much effort has been invested so far on supporting non-Parsoid/VE-generated HTML. Some of this involves adaptively switching between wikitext and HTML representations based on the values of attributes and DOM context. A few special cases are already handled adaptively (multi-paragraph list item contents are serialized as HTML tags for example, generic A elements are serialized to HTML A tags), but in general support for this is mostly missing.

Example issue:

<h1><p>foo</p></h1> will serialize to =\nfoo\n= whereas the
correct serialized output would be: =<p>foo</p>=

What to do about this?

  • add a generic 'can this HTML node be serialized to wikitext in this context' detection method and use that to adaptively switch between wikitext and HTML serialization.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::__construct ( Env  $env,
  $options 
)
Parameters
Env$env
array$optionsList of options for serialization:
  • logType: (string)
  • extName: (string)

Member Function Documentation

◆ emitWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::emitWikitext ( string  $res,
Node  $node 
)

Emit non-separator wikitext that does not need to be escaped.

Parameters
string$res
Node$node

◆ escapeWikitext()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::escapeWikitext ( SerializerState  $state,
string  $text,
array  $opts 
)

Escape wikitext-like strings in '$text' so that $text renders as a plain string when rendered as HTML.

The escaping is done based on the context in which $text is present (ex: start-of-line, in a link, etc.)

Parameters
SerializerState$state
string$text
array$opts
  • node: (Node)
  • isLastChild: (bool)
Returns
string

◆ getAttributeValue()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::getAttributeValue ( Element  $node,
string  $key 
)
Parameters
Element$node
string$keyAttribute name.
Returns
?string The wikitext value, or null if the attribute is not present.

◆ getAttributeValueAsShadowInfo()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::getAttributeValueAsShadowInfo ( Element  $node,
string  $key 
)
Parameters
Element$node
string$key
Returns
array|null A tuple in WTSUtils::getShadowInfo() format, with an extra 'fromDataMW' flag.

◆ handleLIHackIfApplicable()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::handleLIHackIfApplicable ( Element  $node)

FIXME: Get rid of this function after content version 2.2.0 has expired from caches.

Parameters
Element$node

◆ languageVariantHandler()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::languageVariantHandler ( Node  $node)
Parameters
Element$node

◆ linkHandler()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::linkHandler ( Element  $node)

Main link handler.

Parameters
Element$nodeUsed in multiple tag handlers ( and <link>), and hence added as top-level method

◆ serializedImageAttrVal()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializedImageAttrVal ( Element  $dataMWnode,
Element  $htmlAttrNode,
string  $key 
)
Parameters
Element$dataMWnode
Element$htmlAttrNode
string$key
Returns
array A tuple in WTSUtils::getShadowInfo() format, possibly with an extra 'fromDataMW' flag.

◆ serializeDOM()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeDOM ( Node  $node,
bool  $selserMode = false 
)

Serialize an HTML DOM.

WARNING: You probably want to use WikitextContentModelHandler::fromDOM instead.

Parameters
Document | DocumentFragment$node
bool$selserMode
Returns
string

◆ serializeFromParts()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeFromParts ( SerializerState  $state,
Element  $node,
array  $srcParts 
)

Serialize a template from its parts.

Parameters
SerializerState$state
Element$node
list<stdClass|string>$srcParts Template parts from TemplateInfo::getDataMw()
Returns
string

◆ serializeHTMLEndTag()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::serializeHTMLEndTag ( Element  $node,
  $wrapperUnmodified 
)
Parameters
Element$node
bool$wrapperUnmodified
Returns
string

◆ tagNeedsEscaping()

Wikimedia\Parsoid\Html2Wt\WikitextSerializer::tagNeedsEscaping ( string  $name)

Check if token needs escaping.

Parameters
string$name
Returns
bool

◆ trace()

string Log type for Wikimedia\Parsoid\Html2Wt\WikitextSerializer::trace (   $args)
Note
Porting note: this replaces the pattern $serializer->env->log( $serializer->logType, ... )
Parameters
mixed...$args

The documentation for this class was generated from the following file: