Parsoid
A bidirectional parser between wikitext and HTML5
|
A bidirectional Language Converter, capable of round-tripping variant conversion. More...
Classes | |
class | ConstantLanguageGuesser |
A simple LanguageGuesser that returns the same "source language" for every node. More... | |
class | ConversionTraverser |
class | CrhConverter |
class | EnConverter |
class | KuConverter |
class | Language |
Base class for Language objects. More... | |
class | LanguageConverter |
Base class for language variant conversion. More... | |
class | LanguageCrh |
Crimean Tatar (Qırımtatarca) conversion code. More... | |
class | LanguageEn |
English ( / Pig Latin) conversion code. More... | |
class | LanguageGuesser |
An oracle that gives you a predicted "source language" for every node in a DOM, which is used when converting the result back to the source language during round-tripping. More... | |
class | LanguageKu |
Kurdish conversion code. More... | |
class | LanguageSr |
Serbian (Српски / Srpski) specific code. More... | |
class | LanguageZh |
Chinese conversion code. More... | |
class | MachineLanguageGuesser |
Use a {@Link ReplacementMachine} to predict the best "source language" for every node in a DOM. More... | |
class | SrConverter |
class | ZhConverter |
A bidirectional Language Converter, capable of round-tripping variant conversion.
Language conversion is as DOMPostProcessor pass, run over the Parsoid-format HTML output, which may have embedded language converter rules. We first assign a (guessed) wikitext variant to each DOM node, the variant we expect the original wikitext was written in, which will be used when round-tripping the result back to the original wikitext variant. Then for each applicable text node in the DOM, we first "bracket" the text, splitting it into cleanly round-trippable segments and lossy/unclean segments. For the lossy segments we add additional metadata to the output to record the original text used in the wikitext to allow round-tripping (and variant-aware editing).
Note that different wikis have different policies for wikitext variant: in some wikis all articles are authored in one particular variant, by convention. In others, it's a "first author gets to choose the variant" situation. In both cases, a constant/per-article "wikitext variant" may be specified via some as-of-yet-unimplemented mechanism; either part of the site configuration, or per-article metadata like pageLanguage. In other wikis (like zhwiki) the text is a random mix of variants; in these cases the "wikitext variant" will be null/unspecified, and we'll dynamically pick the most likely wikitext variant for each subtree.
Each individual language has a dynamically-loaded subclass of Language
, which may also have a LanguageConverter
subclass to load appropriate ReplacementMachine
s and do other language-specific customizations.