Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Language Namespace Reference

A bidirectional Language Converter, capable of round-tripping variant conversion. More...

Classes

class  ConstantLanguageGuesser
 A simple LanguageGuesser that returns the same "source language" for every node. More...
 
class  ConversionTraverser
 
class  CrhConverter
 
class  EnConverter
 
class  KuConverter
 
class  Language
 Base class for Language objects. More...
 
class  LanguageConverter
 Base class for language variant conversion. More...
 
class  LanguageCrh
 Crimean Tatar (Qırımtatarca) conversion code. More...
 
class  LanguageEn
 English ( / Pig Latin) conversion code. More...
 
class  LanguageGuesser
 An oracle that gives you a predicted "source language" for every node in a DOM, which is used when converting the result back to the source language during round-tripping. More...
 
class  LanguageKu
 Kurdish conversion code. More...
 
class  LanguageSr
 Serbian (Српски / Srpski) specific code. More...
 
class  LanguageZh
 Chinese conversion code. More...
 
class  MachineLanguageGuesser
 Use a {@Link ReplacementMachine} to predict the best "source language" for every node in a DOM. More...
 
class  SrConverter
 
class  ZhConverter
 

Detailed Description

A bidirectional Language Converter, capable of round-tripping variant conversion.

Language conversion is as DOMPostProcessor pass, run over the Parsoid-format HTML output, which may have embedded language converter rules. We first assign a (guessed) source variant to each DOM node, which will be used when round-tripping the result back to the original source variant. Then for each applicable text node in the DOM, we first "bracket" the text, splitting it into cleanly round-trippable segments and lossy/unclean segments. For the lossy segments we add additional metadata to the output to record the original source variant text to allow round-tripping (and variant-aware editing).

Note that different wikis have different policies for source variant: in some wikis all articles are authored in one particular variant, by convention. In others, it's a "first author gets to choose the variant" situation. In both cases, a constant/per-article "source variant" may be specified via some as-of-yet-unimplemented mechanism; either part of the site configuration, or per-article metadata like pageLanguage. In other wikis (like zhwiki) the text is a random mix of variants; in these cases the "source variant" will be null/unspecified, and we'll dynamically pick the most likely source variant for each subtree.

Each individual language has a dynamically-loaded subclass of Language, which may also have a LanguageConverter subclass to load appropriate ReplacementMachines and do other language-specific customizations.