MediaWiki  REL1_31
IcuCollation Class Reference
Inheritance diagram for IcuCollation:
Collaboration diagram for IcuCollation:

Public Member Functions

 __construct ( $locale)
 
 getFirstLetter ( $string)
 Given a string, return the logical "first letter" to be used for grouping on category pages and so on. More...
 
 getFirstLetterCount ()
 
 getFirstLetterData ()
 
 getLetterByIndex ( $index)
 
 getPrimarySortKey ( $string)
 
 getSortKey ( $string)
 Given a string, convert it to a (hopefully short) key that can be used for efficient sorting. More...
 
 getSortKeyByLetterIndex ( $index)
 

Static Public Member Functions

static getICUVersion ()
 Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined. More...
 
static getUnicodeVersionForICU ()
 Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined. More...
 
static isCjk ( $codepoint)
 Test if a code point is a CJK (Chinese, Japanese, Korean) character. More...
 
- Static Public Member Functions inherited from Collation
static factory ( $collationName)
 
static singleton ()
 

Public Attributes

const FIRST_LETTER_VERSION = 3
 
const RECORD_LENGTH = 14
 

Protected Attributes

Language $digitTransformLanguage
 

Private Member Functions

 fetchFirstLetterData ()
 

Private Attributes

array $firstLetterData
 
string $locale
 
Collator $mainCollator
 
Collator $primaryCollator
 
bool $useNumericCollation = false
 

Static Private Attributes

static $cjkBlocks
 Unified CJK blocks. More...
 
static $tailoringFirstLetters
 Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-'). More...
 

Detailed Description

Since
1.16.3

Definition at line 24 of file IcuCollation.php.

Constructor & Destructor Documentation

◆ __construct()

IcuCollation::__construct (   $locale)

Definition at line 246 of file IcuCollation.php.

References $locale, and Language\factory().

Member Function Documentation

◆ fetchFirstLetterData()

IcuCollation::fetchFirstLetterData ( )
private
Returns
array
Exceptions
MWException

Definition at line 350 of file IcuCollation.php.

References $value, as, getPrimarySortKey(), utf8ToCodepoint(), wfDebug(), and wfGetPrecompiledData().

Referenced by getFirstLetterData().

◆ getFirstLetter()

IcuCollation::getFirstLetter (   $string)

Given a string, return the logical "first letter" to be used for grouping on category pages and so on.

This has to be coordinated carefully with convertToSortkey(), or else the sorted list might jump back and forth between the same "initial letters" or other pathological behavior. For instance, if you just return the first character, but "a" sorts the same as "A" based on getSortKey(), then you might get a list like

== A ==

  • [[Aardvark]]

== a ==

  • [[antelope]]

== A ==

  • [[Ape]]

etc., assuming for the sake of argument that $wgCapitalLinks is false.

Since
1.16.3
Parameters
string$stringUTF-8 string
Returns
string UTF-8 string corresponding to the first letter of input

Reimplemented from Collation.

Reimplemented in CollationFa, and CollationEt.

Definition at line 283 of file IcuCollation.php.

References ArrayUtils\findLowerBound(), getFirstLetterCount(), getLetterByIndex(), getPrimarySortKey(), utf8ToCodepoint(), and wfMessage().

◆ getFirstLetterCount()

IcuCollation::getFirstLetterCount ( )
Returns
string
Since
1.16.3

Definition at line 506 of file IcuCollation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getFirstLetterData()

IcuCollation::getFirstLetterData ( )

◆ getICUVersion()

static IcuCollation::getICUVersion ( )
static

Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined.

The constant INTL_ICU_VERSION this function refers to isn't really documented. It is available since PHP 5.3.7 (see PHP 54561 https://bugs.php.net/bug.php?id=54561). This function will return false on older PHPs.

TODO: Remove the backwards-compatibility as MediaWiki now requires higher levels of PHP.

Since
1.21
Returns
string|bool

Definition at line 541 of file IcuCollation.php.

Referenced by GenerateCollationData\execute(), getUnicodeVersionForICU(), and SpecialVersion\softwareInformation().

◆ getLetterByIndex()

IcuCollation::getLetterByIndex (   $index)
Parameters
string$index
Returns
string
Since
1.16.3

Definition at line 489 of file IcuCollation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getPrimarySortKey()

IcuCollation::getPrimarySortKey (   $string)

Definition at line 279 of file IcuCollation.php.

Referenced by fetchFirstLetterData(), and getFirstLetter().

◆ getSortKey()

IcuCollation::getSortKey (   $string)

Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.

A binary sort according to the sortkeys corresponds to a logical sort of the corresponding strings. Current code expects that a line feed character should sort before all others, but has no other particular expectations (and that one can be changed if necessary).

Since
1.16.3
Parameters
string$stringUTF-8 string
Returns
string Binary sortkey

Reimplemented from Collation.

Reimplemented in CollationFa, and CollationEt.

Definition at line 275 of file IcuCollation.php.

◆ getSortKeyByLetterIndex()

IcuCollation::getSortKeyByLetterIndex (   $index)
Parameters
string$index
Returns
string
Since
1.16.3

Definition at line 498 of file IcuCollation.php.

References getFirstLetterData().

◆ getUnicodeVersionForICU()

static IcuCollation::getUnicodeVersionForICU ( )
static

Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined.

Since
1.21
Returns
string|bool

Definition at line 552 of file IcuCollation.php.

References getICUVersion().

Referenced by GenerateCollationData\execute().

◆ isCjk()

static IcuCollation::isCjk (   $codepoint)
static

Test if a code point is a CJK (Chinese, Japanese, Korean) character.

Parameters
int$codepoint
Returns
bool
Since
1.16.3

Definition at line 516 of file IcuCollation.php.

References as.

Referenced by GenerateCollationData\charCallback().

Member Data Documentation

◆ $cjkBlocks

IcuCollation::$cjkBlocks
staticprivate
Initial value:
= [
[ 0x2E80, 0x2EFF ],
[ 0x2F00, 0x2FDF ],
[ 0x2FF0, 0x2FFF ],
[ 0x3000, 0x303F ],
[ 0x31C0, 0x31EF ],
[ 0x3200, 0x32FF ],
[ 0x3300, 0x33FF ],
[ 0x3400, 0x4DBF ],
[ 0x4E00, 0x9FFF ],
[ 0xF900, 0xFAFF ],
[ 0xFE30, 0xFE4F ],
[ 0x20000, 0x2A6DF ],
[ 0x2A700, 0x2B73F ],
[ 0x2B740, 0x2B81F ],
[ 0x2F800, 0x2FA1F ],
]

Unified CJK blocks.

The same definition of a CJK block must be used for both Collation and generateCollationData.php. These blocks are omitted from the first letter data, as an optimisation measure and because the default UCA table is pretty useless for sorting Chinese text anyway. Japanese and Korean blocks are not included here, because they are smaller and more useful.

Definition at line 54 of file IcuCollation.php.

◆ $digitTransformLanguage

Language IcuCollation::$digitTransformLanguage
protected

Definition at line 37 of file IcuCollation.php.

◆ $firstLetterData

array IcuCollation::$firstLetterData
private

Definition at line 43 of file IcuCollation.php.

Referenced by getFirstLetterData().

◆ $locale

string IcuCollation::$locale
private

Definition at line 34 of file IcuCollation.php.

Referenced by __construct().

◆ $mainCollator

Collator IcuCollation::$mainCollator
private

Definition at line 31 of file IcuCollation.php.

◆ $primaryCollator

Collator IcuCollation::$primaryCollator
private

Definition at line 28 of file IcuCollation.php.

◆ $tailoringFirstLetters

IcuCollation::$tailoringFirstLetters
staticprivate

Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-').

These are additions to (or subtractions from) the data stored in the first-letters-root.ser file (which among others includes full basic latin, cyrillic and greek alphabets).

"Separate letter" is a letter that would have a separate heading/section for it in a dictionary or a phone book in this language. This data isn't used for sorting (the ICU library handles that), only for deciding which characters (or character groups) to use as headings.

Initially generated based on the primary level of Unicode collation tailorings available at http://developer.mimer.com/charts/tailorings.htm , later modified.

Empty arrays are intended; this signifies that the data for the language is available and that there are, in fact, no additional letters to consider.

Definition at line 93 of file IcuCollation.php.

◆ $useNumericCollation

bool IcuCollation::$useNumericCollation = false
private

Definition at line 40 of file IcuCollation.php.

◆ FIRST_LETTER_VERSION

const IcuCollation::FIRST_LETTER_VERSION = 3

Definition at line 25 of file IcuCollation.php.

◆ RECORD_LENGTH

const IcuCollation::RECORD_LENGTH = 14
Since
1.16.3

Definition at line 244 of file IcuCollation.php.


The documentation for this class was generated from the following file: