Parsoid
A bidirectional parser between wikitext and HTML5
|
A PFragment is a MediaWiki content fragment. More...
Public Member Functions | |
isEmpty () | |
Returns true if this fragment is empty. | |
isAtomic () | |
Returns true if this fragment contains no wikitext elements; that is, if ::asMarkedWikitext() given an empty strip state would return a single strip marker and add a single item to the strip state (representing $this). | |
isValid () | |
As an optimization to avoid unnecessary copying, certain operations on fragments may be destructive or lead to aliasing. | |
getSrcOffsets () | |
Return the region of the source document that corresponds to this fragment. | |
asDom (ParsoidExtensionAPI $ext, bool $release=false) | |
Return the fragment as a (prepared and loaded) DOM DocumentFragment belonging to the Parsoid top-level document. | |
asHtmlString (ParsoidExtensionAPI $ext) | |
Return the fragment as a string of HTML. | |
asMarkedWikitext (StripState $stripState) | |
This method returns a "wikitext string" in the legacy format. | |
registerFragmentClass (string $className) | |
Register a fragment type with the JSON deserialization code. | |
Static Public Member Functions | |
static | fromSplitWt (array $pieces, ?DomSourceRange $srcOffset=null) |
Helper function to create a new fragment from a mixed array of strings and fragments. | |
static | newFromJsonArray (array $json) |
static | jsonClassHintFor (string $keyName) |
static | hint () |
Protected Member Functions | |
__construct (?DomSourceRange $srcOffsets) | |
toJsonArray () | |
Static Protected Member Functions | |
static | joinSourceRange (?DomSourceRange $first, ?DomSourceRange $second) |
Helper function to append two source ranges. | |
Protected Attributes | |
DomSourceRange | $srcOffsets |
The original wikitext source range for this fragment, or null for synthetic content that corresponds to no part of the original authored text. | |
Static Protected Attributes | |
static array | $FRAGMENT_TYPES |
A PFragment is a MediaWiki content fragment.
PFragment is the input and output type for fragment generators in MediaWiki: magic variables, parser functions, templates, and extension tags. You can imagine that the P
stands for "Parsoid", "Page", or "MediaWiki Content" but in reality it simply disambiguates this fragment type from the DOM DocumentFragment and any other fragments you might encounter.
PFragment is an abstract class, and content is lazily converted to the form demanded by a consumer. Converting forms often loses information or introduces edge cases, so we avoid conversion to intermediate forms and defer conversion in general as late as possible.
For example, in this invocation: {{1x|'''bold''' <nowiki>fragment</nowiki>}}
If we were to flatten this "as string" (traditionally) we would lose the bold face and the <nowiki> would get tunneled as strip state. Alternatively we could ask for this "as a source string" which corresponds to the original "raw" form: "'''bold'''
<nowiki>fragment</nowiki>", which is often used to pass literal arguments, bypassing wikitext processing. Or we could ask for the argument "as HTML" or "as DOM" in which case it would get parsed as wikitext and returned as <b>bold</b> <span>fragment</span>
, either as a possibly-unbalanced string ("as HTML") or as a balanced DOM tree ("as DOM"). These transformations can be irreversible: once we've converted to one representation we can't always recover the others.
But now consider if {{1x|...}}
simply wants to return its argument: it doesn't need to force a specific representation, instead it can return the PFragment directly without losing information and allow the downstream customer to chose the type it prefers. This also works for composition: a composite PFragment can be defined which defers evaluation of its components until demanded, and then applies the appropriate composition operation depending on the demanded result.
(WikitextPFragment is one such composite fragment type, which uses Parsoid to do the composition of wikitext and other fragments.)
Parsoid defines only those fragment types relevant to itself, and defines conversions (as*()
methods) only for those formats it needs for HTML rendering. Extensions should feel free to define their own fragment types: as long as they are JsonCodecable and define one of ::asDom() or ::asHtmlString() they will interoperate with Parsoid and other extensions, albeit possibly as an opaque strip marker.
For example, Wikifunctions might define a PFragment for ZObjects, which would allow nested wikifunction invocations to transfer ZObjects between themselves without conversion through wikitext. For example, given: {{#function:sum| {{#function:one}} }} then the sum
function will be given a ZObjectPFragment containing the output of the one
function, without forcing that value to serialize to a wikitext string and deserialize. With its special knowledge of the ZObjectPFragment type, Wikifunctions can use this to (say) preserve type information of the values. But if this same function is embedded into a wikitext template: {{1x| {{#function:one}} }} then the value will be converted to wikitext or DOM as appropriate and composed onto the page in that form.
Wikimedia\Parsoid\Fragments\PFragment::asDom | ( | ParsoidExtensionAPI | $ext, |
bool | $release = false ) |
Return the fragment as a (prepared and loaded) DOM DocumentFragment belonging to the Parsoid top-level document.
If $release is true, then this PFragment will become invalid after this method returns.
Reimplemented in Wikimedia\Parsoid\Fragments\DomPFragment, Wikimedia\Parsoid\Fragments\LiteralStringPFragment, and Wikimedia\Parsoid\Fragments\WikitextPFragment.
Wikimedia\Parsoid\Fragments\PFragment::asHtmlString | ( | ParsoidExtensionAPI | $ext | ) |
Return the fragment as a string of HTML.
This method is very similar to asDom() but also supports fragmentary and unbalanced HTML, and therefore composition may yield unexpected results. This is a common type in legacy MediaWiki code, but use in new code should be discouraged. Data attributes will be represented as inline attributes, which may be suboptimal.
Reimplemented in Wikimedia\Parsoid\Fragments\DomPFragment, Wikimedia\Parsoid\Fragments\HtmlPFragment, and Wikimedia\Parsoid\Fragments\LiteralStringPFragment.
Wikimedia\Parsoid\Fragments\PFragment::asMarkedWikitext | ( | StripState | $stripState | ) |
This method returns a "wikitext string" in the legacy format.
Wikitext constructs will be parsed in the result. Constructs which are not representable in wikitext will be replaced with strip markers, and you will get a strip state which maps those markers back to PFragment objects. When you (for example) compose two marked strings and then ask for the result asDom
, the strip markers in the marked strings will first be conceptually replaced with the PFragment from the StripState, and then the resulting interleaved strings and fragments will be composed.
Reimplemented in Wikimedia\Parsoid\Fragments\WikitextPFragment.
|
static |
Helper function to create a new fragment from a mixed array of strings and fragments.
Unlike WikitextPFragment::newFromSplitWt() this method will not always return a WikitextPFragment; for example if only one non-empty piece is provided this method will just return that piece without casting it to a WikitextPFragment.
list<string|PFragment> | $pieces |
Wikimedia\Parsoid\Fragments\PFragment::isAtomic | ( | ) |
Returns true if this fragment contains no wikitext elements; that is, if ::asMarkedWikitext()
given an empty strip state would return a single strip marker and add a single item to the strip state (representing $this).
Otherwise, returns false.
Reimplemented in Wikimedia\Parsoid\Fragments\WikitextPFragment.
Wikimedia\Parsoid\Fragments\PFragment::isEmpty | ( | ) |
Returns true if this fragment is empty.
This enables optimizations if implemented, but returns false by default.
Reimplemented in Wikimedia\Parsoid\Fragments\DomPFragment, Wikimedia\Parsoid\Fragments\HtmlPFragment, Wikimedia\Parsoid\Fragments\LiteralStringPFragment, and Wikimedia\Parsoid\Fragments\WikitextPFragment.
Wikimedia\Parsoid\Fragments\PFragment::isValid | ( | ) |
As an optimization to avoid unnecessary copying, certain operations on fragments may be destructive or lead to aliasing.
For ease of debugging, fragments so affected will return false
from ::isValid()
and code is encouraged to assert the validity of fragments where convenient to do so.
::asDom()
and DomPFragment::concat
, but other PFragment types with mutable non-value types might also provide accessors with $release
parameters that interact with fragment validity. Reimplemented in Wikimedia\Parsoid\Fragments\DomPFragment.
|
static |
|
static |
Wikimedia\Parsoid\Fragments\PFragment::registerFragmentClass | ( | string | $className | ) |
Register a fragment type with the JSON deserialization code.
The given class should have a static constant named TYPE_HINT which gives the unique string property name which will distinguish serialized fragments of the given class.
class-string<PFragment> | $className |
|
protected |
|
staticprotected |