MediaWiki
1.32.5
|
Unicode normalization routines for working with UTF-8 strings. More...
Static Public Member Functions | |
static | cleanUp ( $string) |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition. More... | |
static | quickIsNFC ( $string) |
Returns true if the string is definitely in NFC. More... | |
static | quickIsNFCVerify (&$string) |
Returns true if the string is definitely in NFC. More... | |
static | toNFC ( $string) |
Convert a UTF-8 string to normal form C, canonical composition. More... | |
static | toNFD ( $string) |
Convert a UTF-8 string to normal form D, canonical decomposition. More... | |
static | toNFKC ( $string) |
Convert a UTF-8 string to normal form KC, compatibility composition. More... | |
static | toNFKD ( $string) |
Convert a UTF-8 string to normal form KD, compatibility decomposition. More... | |
Unicode normalization routines for working with UTF-8 strings.
Currently assumes that input strings are valid UTF-8!
Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.
All functions can be called static.
See description of forms at https://www.unicode.org/reports/tr15/
Definition at line 48 of file UtfNormal.php.
|
static |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().
string | $string | a UTF-8 string |
Definition at line 59 of file UtfNormal.php.
References wfDeprecated().
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 121 of file UtfNormal.php.
References wfDeprecated().
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
string | &$string | a UTF-8 string, altered on output to be valid UTF-8 safe for XML. |
Definition at line 132 of file UtfNormal.php.
References wfDeprecated().
|
static |
Convert a UTF-8 string to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 72 of file UtfNormal.php.
References wfDeprecated().
|
static |
Convert a UTF-8 string to normal form D, canonical decomposition.
Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 84 of file UtfNormal.php.
References wfDeprecated().
|
static |
Convert a UTF-8 string to normal form KC, compatibility composition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 97 of file UtfNormal.php.
References wfDeprecated().
|
static |
Convert a UTF-8 string to normal form KD, compatibility decomposition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
string | $string | a valid UTF-8 string. Input is not validated. |
Definition at line 110 of file UtfNormal.php.
References wfDeprecated().