MediaWiki
1.27.3
|
Unicode normalization routines for working with UTF-8 strings. More...
Static Public Member Functions | |
static | cleanUp ($string) |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition. More... | |
static | quickIsNFC ($string) |
Returns true if the string is definitely in NFC. More... | |
static | quickIsNFCVerify (&$string) |
Returns true if the string is definitely in NFC. More... | |
static | toNFC ($string) |
Convert a UTF-8 string to normal form C, canonical composition. More... | |
static | toNFD ($string) |
Convert a UTF-8 string to normal form D, canonical decomposition. More... | |
static | toNFKC ($string) |
Convert a UTF-8 string to normal form KC, compatibility composition. More... | |
static | toNFKD ($string) |
Convert a UTF-8 string to normal form KD, compatibility decomposition. More... | |
Unicode normalization routines for working with UTF-8 strings.
Currently assumes that input strings are valid UTF-8!
Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.
All functions can be called static.
See description of forms at http://www.unicode.org/reports/tr15/
Definition at line 48 of file UtfNormal.php.
|
static |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().
string | $string | a UTF-8 string |
Definition at line 59 of file UtfNormal.php.
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 116 of file UtfNormal.php.
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
string | $string | a UTF-8 string, altered on output to be valid UTF-8 safe for XML. |
Definition at line 126 of file UtfNormal.php.
|
static |
Convert a UTF-8 string to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 71 of file UtfNormal.php.
|
static |
Convert a UTF-8 string to normal form D, canonical decomposition.
Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 82 of file UtfNormal.php.
|
static |
Convert a UTF-8 string to normal form KC, compatibility composition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 94 of file UtfNormal.php.
|
static |
Convert a UTF-8 string to normal form KD, compatibility decomposition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 106 of file UtfNormal.php.