Unicode normalization routines for working with UTF-8 strings.
More...
|
static | cleanUp ( $string) |
| The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
|
|
static | toNFC ( $string) |
| Convert a UTF-8 string to normal form C, canonical composition.
|
|
static | toNFD ( $string) |
| Convert a UTF-8 string to normal form D, canonical decomposition.
|
|
static | toNFKC ( $string) |
| Convert a UTF-8 string to normal form KC, compatibility composition.
|
|
static | toNFKD ( $string) |
| Convert a UTF-8 string to normal form KD, compatibility decomposition.
|
|
static | loadData () |
| Load the basic composition data if necessary.
|
|
static | quickIsNFC ( $string) |
| Returns true if the string is definitely in NFC.
|
|
static | quickIsNFCVerify (&$string) |
| Returns true if the string is definitely in NFC.
|
|
static | NFC ( $string) |
|
static | NFD ( $string) |
|
static | NFKC ( $string) |
|
static | NFKD ( $string) |
|
static | fastDecompose ( $string, $map) |
| Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).
|
|
static | fastCombiningSort ( $string) |
| Sorts combining characters into canonical order.
|
|
static | fastCompose ( $string) |
| Produces canonically composed sequences, i.e.
|
|
static | placebo ( $string) |
| This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.
|
|
|
static | $utfCombiningClass |
|
static | $utfCanonicalComp |
|
static | $utfCanonicalDecomp |
|
static | $utfCompatibilityDecomp |
|
static | $utfCheckNFC |
|
Unicode normalization routines for working with UTF-8 strings.
Currently, it assumes that input strings are valid UTF-8!
Not as fast as I'd like, but should be usable for most purposes. UtfNormal\Validator::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.
All functions can be called static.
See description of forms at http://www.unicode.org/reports/tr15/
◆ cleanUp()
static UtfNormal\Validator::cleanUp |
( |
|
$string | ) |
|
|
static |
The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().
- Parameters
-
string | $string | a UTF-8 string |
- Returns
- string a clean, shiny, normalized UTF-8 string
◆ fastCombiningSort()
static UtfNormal\Validator::fastCombiningSort |
( |
|
$string | ) |
|
|
static |
Sorts combining characters into canonical order.
This is the final step in creating decomposed normal forms D and KD.
- Parameters
-
string | $string | a valid, decomposed UTF-8 string. Input is not validated. |
- Returns
- string a UTF-8 string with combining characters sorted in canonical order
◆ fastCompose()
static UtfNormal\Validator::fastCompose |
( |
|
$string | ) |
|
|
static |
Produces canonically composed sequences, i.e.
normal form C or KC.
- Parameters
-
string | $string | a valid UTF-8 string in sorted normal form D or KD. Input is not validated. |
- Returns
- string a UTF-8 string with canonical precomposed characters used where possible.
◆ fastDecompose()
static UtfNormal\Validator::fastDecompose |
( |
|
$string, |
|
|
|
$map |
|
) |
| |
|
static |
Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).
Input is assumed to be valid UTF-8. Invalid code will break.
- Parameters
-
string | $string | valid UTF-8 string |
array | $map | hash of expanded decomposition map |
- Returns
- string a UTF-8 string decomposed, not yet normalized (needs sorting)
◆ NFC()
static UtfNormal\Validator::NFC |
( |
|
$string | ) |
|
|
static |
◆ NFD()
static UtfNormal\Validator::NFD |
( |
|
$string | ) |
|
|
static |
◆ NFKC()
static UtfNormal\Validator::NFKC |
( |
|
$string | ) |
|
|
static |
◆ NFKD()
static UtfNormal\Validator::NFKD |
( |
|
$string | ) |
|
|
static |
◆ placebo()
static UtfNormal\Validator::placebo |
( |
|
$string | ) |
|
|
static |
This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.
- Parameters
-
- Returns
- string
◆ quickIsNFC()
static UtfNormal\Validator::quickIsNFC |
( |
|
$string | ) |
|
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
- Parameters
-
string | $string | a valid UTF-8 string. Input is not validated. |
- Returns
- bool
◆ quickIsNFCVerify()
static UtfNormal\Validator::quickIsNFCVerify |
( |
& |
$string | ) |
|
|
static |
Returns true if the string is definitely in NFC.
Returns false if not or uncertain.
- Parameters
-
string | &$string | A UTF-8 string, altered on output to be valid UTF-8 safe for XML. |
- Returns
- bool
◆ toNFC()
static UtfNormal\Validator::toNFC |
( |
|
$string | ) |
|
|
static |
Convert a UTF-8 string to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.
- Parameters
-
string | $string | a valid UTF-8 string. Input is not validated. |
- Returns
- string a UTF-8 string in normal form C
◆ toNFD()
static UtfNormal\Validator::toNFD |
( |
|
$string | ) |
|
|
static |
Convert a UTF-8 string to normal form D, canonical decomposition.
Fast return for pure ASCII strings.
- Parameters
-
string | $string | A valid UTF-8 string. Input is not validated. |
- Returns
- string A UTF-8 string in normal form D
◆ toNFKC()
static UtfNormal\Validator::toNFKC |
( |
|
$string | ) |
|
|
static |
Convert a UTF-8 string to normal form KC, compatibility composition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
- Parameters
-
string | $string | A valid UTF-8 string. Input is not validated. |
- Returns
- string A UTF-8 string in normal form KC
◆ toNFKD()
static UtfNormal\Validator::toNFKD |
( |
|
$string | ) |
|
|
static |
Convert a UTF-8 string to normal form KD, compatibility decomposition.
This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.
- Parameters
-
string | $string | a valid UTF-8 string. Input is not validated. |
- Returns
- string a UTF-8 string in normal form KD
The documentation for this class was generated from the following file: