Language Converter
Finite-State Transducer implementation of MediaWiki LanguageConverter
|
Public Member Functions | |
__construct ( $baseLanguage, $codes) | |
ReplacementMachine constructor. | |
getCodes () | |
Return the set of language codes supported. | |
loadFST (string $filename, bool $justBrackets=false) | |
Load a conversion machine from a pFST file with filename $filename from the fst directory. | |
countBrackets (string $s, $destCode, $invertCode) | |
Quantify a guess about the "native" language of string s . | |
convert ( $document, $s, $destCode, $invertCode) | |
Convert a string of text. | |
Public Member Functions inherited from Wikimedia\LangConv\ReplacementMachine | |
__construct () | |
ReplacementMachine constructor. | |
isValidCodePair ( $destCode, $invertCode) | |
Override this method in subclass if you want to limit the possible code pairs bracketed. | |
replace ( $textNode, $destCode, $invertCode) | |
Replace the given text Node with converted text, protecting any markup which can't be round-tripped back to invertCode with appropriate synthetic language-converter markup. | |
jsonEncode (array $obj) | |
Allow client to customize the JSON encoding of data-mw-variant attributes. | |
Wikimedia\LangConv\FstReplacementMachine::__construct | ( | $baseLanguage, | |
$codes ) |
ReplacementMachine constructor.
string | $baseLanguage | |
string[] | $codes |
Wikimedia\LangConv\FstReplacementMachine::convert | ( | $document, | |
$s, | |||
$destCode, | |||
$invertCode ) |
Convert a string of text.
DOMDocument | $document | |
string | $s | text to convert |
string | $destCode | destination language code |
string | $invertCode |
Reimplemented from Wikimedia\LangConv\ReplacementMachine.
Wikimedia\LangConv\FstReplacementMachine::countBrackets | ( | string | $s, |
$destCode, | |||
$invertCode ) |
Quantify a guess about the "native" language of string s
.
We will be converting to destCode
, and our guess is that when we round trip we'll want to convert back to invertCode
(so invertCode
is our guess about the actual language of s
). If we were to make this encoding, the returned value unsafe
is the number of codepoints we'd have to specially-escape, safe
is the number of codepoints we wouldn't have to escape, and len
is the total number of codepoints in s
. Generally lower values of nonsafe
indicate a better guess for invertCode
.
string | $s | |
string | $destCode | |
string | $invertCode |
Wikimedia\LangConv\FstReplacementMachine::getCodes | ( | ) |
Return the set of language codes supported.
Both key and value are set in order to facilitate inclusion testing.
Reimplemented from Wikimedia\LangConv\ReplacementMachine.
Wikimedia\LangConv\FstReplacementMachine::loadFST | ( | string | $filename, |
bool | $justBrackets = false ) |
Load a conversion machine from a pFST file with filename $filename from the fst directory.
string | $filename | filename, omitting the .pfst file extension |
bool | $justBrackets | whether to return only the bracket locations |