MediaWiki REL1_30
UtfNormal Class Reference

Unicode normalization routines for working with UTF-8 strings. More...

Static Public Member Functions

static cleanUp ( $string)
 The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
 
static quickIsNFC ( $string)
 Returns true if the string is definitely in NFC.
 
static quickIsNFCVerify (&$string)
 Returns true if the string is definitely in NFC.
 
static toNFC ( $string)
 Convert a UTF-8 string to normal form C, canonical composition.
 
static toNFD ( $string)
 Convert a UTF-8 string to normal form D, canonical decomposition.
 
static toNFKC ( $string)
 Convert a UTF-8 string to normal form KC, compatibility composition.
 
static toNFKD ( $string)
 Convert a UTF-8 string to normal form KD, compatibility decomposition.
 

Detailed Description

Unicode normalization routines for working with UTF-8 strings.

Currently assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

Deprecated
since 1.25, use UtfNormal\Validator directly

Definition at line 48 of file UtfNormal.php.

Member Function Documentation

◆ cleanUp()

static UtfNormal::cleanUp ( $string)
static

The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().

Parameters
string$stringa UTF-8 string
Returns
string a clean, shiny, normalized UTF-8 string

Definition at line 59 of file UtfNormal.php.

◆ quickIsNFC()

static UtfNormal::quickIsNFC ( $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
bool

Definition at line 116 of file UtfNormal.php.

◆ quickIsNFCVerify()

static UtfNormal::quickIsNFCVerify ( & $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string&$stringa UTF-8 string, altered on output to be valid UTF-8 safe for XML.
Returns
bool

Definition at line 126 of file UtfNormal.php.

◆ toNFC()

static UtfNormal::toNFC ( $string)
static

Convert a UTF-8 string to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form C

Definition at line 71 of file UtfNormal.php.

◆ toNFD()

static UtfNormal::toNFD ( $string)
static

Convert a UTF-8 string to normal form D, canonical decomposition.

Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form D

Definition at line 82 of file UtfNormal.php.

◆ toNFKC()

static UtfNormal::toNFKC ( $string)
static

Convert a UTF-8 string to normal form KC, compatibility composition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form KC

Definition at line 94 of file UtfNormal.php.

◆ toNFKD()

static UtfNormal::toNFKD ( $string)
static

Convert a UTF-8 string to normal form KD, compatibility decomposition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form KD

Definition at line 106 of file UtfNormal.php.


The documentation for this class was generated from the following file: