utfnormal
Unicode normalization for PHP
Loading...
Searching...
No Matches
UtfNormal\Validator Class Reference

Unicode normalization routines for working with UTF-8 strings. More...

Static Public Member Functions

static cleanUp ( $string)
 The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
 
static toNFC ( $string)
 Convert a UTF-8 string to normal form C, canonical composition.
 
static toNFD ( $string)
 Convert a UTF-8 string to normal form D, canonical decomposition.
 
static toNFKC ( $string)
 Convert a UTF-8 string to normal form KC, compatibility composition.
 
static toNFKD ( $string)
 Convert a UTF-8 string to normal form KD, compatibility decomposition.
 
static loadData ()
 Load the basic composition data if necessary.
 
static quickIsNFC ( $string)
 Returns true if the string is definitely in NFC.
 
static quickIsNFCVerify (&$string)
 Returns true if the string is definitely in NFC.
 
static NFC ( $string)
 
static NFD ( $string)
 
static NFKC ( $string)
 
static NFKD ( $string)
 
static fastDecompose ( $string, $map)
 Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).
 
static fastCombiningSort ( $string)
 Sorts combining characters into canonical order.
 
static fastCompose ( $string)
 Produces canonically composed sequences, i.e.
 
static placebo ( $string)
 This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.
 

Static Public Attributes

static $utfCombiningClass
 
static $utfCanonicalComp
 
static $utfCanonicalDecomp
 
static $utfCompatibilityDecomp
 
static $utfCheckNFC
 

Detailed Description

Unicode normalization routines for working with UTF-8 strings.

Currently, it assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes. UtfNormal\Validator::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

Member Function Documentation

◆ cleanUp()

static UtfNormal\Validator::cleanUp (   $string)
static

The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().

Parameters
string$stringa UTF-8 string
Returns
string a clean, shiny, normalized UTF-8 string

◆ fastCombiningSort()

static UtfNormal\Validator::fastCombiningSort (   $string)
static

Sorts combining characters into canonical order.

This is the final step in creating decomposed normal forms D and KD.

Parameters
string$stringa valid, decomposed UTF-8 string. Input is not validated.
Returns
string a UTF-8 string with combining characters sorted in canonical order

◆ fastCompose()

static UtfNormal\Validator::fastCompose (   $string)
static

Produces canonically composed sequences, i.e.

normal form C or KC.

Parameters
string$stringa valid UTF-8 string in sorted normal form D or KD. Input is not validated.
Returns
string a UTF-8 string with canonical precomposed characters used where possible.

◆ fastDecompose()

static UtfNormal\Validator::fastDecompose (   $string,
  $map 
)
static

Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).

Input is assumed to be valid UTF-8. Invalid code will break.

Parameters
string$stringvalid UTF-8 string
array$maphash of expanded decomposition map
Returns
string a UTF-8 string decomposed, not yet normalized (needs sorting)

◆ NFC()

static UtfNormal\Validator::NFC (   $string)
static
Parameters
string$string
Returns
string

◆ NFD()

static UtfNormal\Validator::NFD (   $string)
static
Parameters
string$string
Returns
string

◆ NFKC()

static UtfNormal\Validator::NFKC (   $string)
static
Parameters
string$string
Returns
string

◆ NFKD()

static UtfNormal\Validator::NFKD (   $string)
static
Parameters
string$string
Returns
string

◆ placebo()

static UtfNormal\Validator::placebo (   $string)
static

This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.

Parameters
string$string
Returns
string

◆ quickIsNFC()

static UtfNormal\Validator::quickIsNFC (   $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
bool

◆ quickIsNFCVerify()

static UtfNormal\Validator::quickIsNFCVerify ( $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string&$stringA UTF-8 string, altered on output to be valid UTF-8 safe for XML.
Returns
bool

◆ toNFC()

static UtfNormal\Validator::toNFC (   $string)
static

Convert a UTF-8 string to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form C

◆ toNFD()

static UtfNormal\Validator::toNFD (   $string)
static

Convert a UTF-8 string to normal form D, canonical decomposition.

Fast return for pure ASCII strings.

Parameters
string$stringA valid UTF-8 string. Input is not validated.
Returns
string A UTF-8 string in normal form D

◆ toNFKC()

static UtfNormal\Validator::toNFKC (   $string)
static

Convert a UTF-8 string to normal form KC, compatibility composition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringA valid UTF-8 string. Input is not validated.
Returns
string A UTF-8 string in normal form KC

◆ toNFKD()

static UtfNormal\Validator::toNFKD (   $string)
static

Convert a UTF-8 string to normal form KD, compatibility decomposition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form KD

The documentation for this class was generated from the following file: