MediaWiki  1.23.5
IcuCollation Class Reference
Inheritance diagram for IcuCollation:
Collaboration diagram for IcuCollation:

Public Member Functions

 __construct ( $locale)
 
 findLowerBound ( $valueCallback, $valueCount, $comparisonCallback, $target)
 Do a binary search, and return the index of the largest item that sorts less than or equal to the target value. More...
 
 getFirstLetter ( $string)
 Given a string, return the logical "first letter" to be used for grouping on category pages and so on. More...
 
 getFirstLetterCount ()
 
 getFirstLetterData ()
 
 getLetterByIndex ( $index)
 
 getPrimarySortKey ( $string)
 
 getSortKey ( $string)
 Given a string, convert it to a (hopefully short) key that can be used for efficient sorting. More...
 
 getSortKeyByLetterIndex ( $index)
 

Static Public Member Functions

static getICUVersion ()
 Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined. More...
 
static getUnicodeVersionForICU ()
 Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined. More...
 
static isCjk ( $codepoint)
 
- Static Public Member Functions inherited from Collation
static factory ( $collationName)
 
static singleton ()
 

Public Attributes

 $digitTransformLanguage
 
 $firstLetterData
 
 $locale
 
 $mainCollator
 
 $primaryCollator
 
const FIRST_LETTER_VERSION = 2
 
const RECORD_LENGTH = 14
 

Static Public Attributes

static $cjkBlocks
 Unified CJK blocks. More...
 
static $tailoringFirstLetters
 Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-'). More...
 
- Static Public Attributes inherited from Collation
static $instance
 

Detailed Description

Definition at line 153 of file Collation.php.

Constructor & Destructor Documentation

◆ __construct()

IcuCollation::__construct (   $locale)

Definition at line 284 of file Collation.php.

References $locale, and Language\factory().

Member Function Documentation

◆ findLowerBound()

IcuCollation::findLowerBound (   $valueCallback,
  $valueCount,
  $comparisonCallback,
  $target 
)

Do a binary search, and return the index of the largest item that sorts less than or equal to the target value.

Deprecated:
in 1.23; use ArrayUtils::findLowerBound() instead
Parameters
array$valueCallbackA function to call to get the value with a given array index.
int$valueCountThe number of items accessible via $valueCallback, indexed from 0 to $valueCount - 1
array$comparisonCallbackA callback to compare two values, returning -1, 0 or 1 in the style of strcmp().
string$targetThe target value to find.
Returns
int|bool The item index of the lower bound, or false if the target value sorts before all items.

Definition at line 530 of file Collation.php.

References ArrayUtils\findLowerBound(), and wfDeprecated().

◆ getFirstLetter()

IcuCollation::getFirstLetter (   $string)

Given a string, return the logical "first letter" to be used for grouping on category pages and so on.

This has to be coordinated carefully with convertToSortkey(), or else the sorted list might jump back and forth between the same "initial letters" or other pathological behavior. For instance, if you just return the first character, but "a" sorts the same as "A" based on getSortKey(), then you might get a list like

== A ==

  • [[Aardvark]]

== a ==

  • [[antelope]]

== A ==

  • [[Ape]]

etc., assuming for the sake of argument that $wgCapitalLinks is false.

Parameters
string$stringUTF-8 string
Returns
string UTF-8 string corresponding to the first letter of input

Reimplemented from Collation.

Definition at line 321 of file Collation.php.

References array(), ArrayUtils\findLowerBound(), getFirstLetterCount(), getLetterByIndex(), getPrimarySortKey(), and utf8ToCodepoint().

◆ getFirstLetterCount()

IcuCollation::getFirstLetterCount ( )

Definition at line 506 of file Collation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getFirstLetterData()

◆ getICUVersion()

static IcuCollation::getICUVersion ( )
static

Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined.

The constant INTL_ICU_VERSION this function refers to isn't really documented. It is available since PHP 5.3.7 (see PHP bug 54561). This function will return false on older PHPs.

Since
1.21
Returns
string|false

Definition at line 556 of file Collation.php.

Referenced by GenerateCollationData\execute(), and getUnicodeVersionForICU().

◆ getLetterByIndex()

IcuCollation::getLetterByIndex (   $index)

Definition at line 492 of file Collation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getPrimarySortKey()

IcuCollation::getPrimarySortKey (   $string)

Definition at line 314 of file Collation.php.

References wfRestoreWarnings(), and wfSuppressWarnings().

Referenced by getFirstLetter(), and getFirstLetterData().

◆ getSortKey()

IcuCollation::getSortKey (   $string)

Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.

A binary sort according to the sortkeys corresponds to a logical sort of the corresponding strings. Current code expects that a line feed character should sort before all others, but has no other particular expectations (and that one can be changed if necessary).

Parameters
string$stringUTF-8 string
Returns
string Binary sortkey

Reimplemented from Collation.

Definition at line 304 of file Collation.php.

References wfRestoreWarnings(), and wfSuppressWarnings().

◆ getSortKeyByLetterIndex()

IcuCollation::getSortKeyByLetterIndex (   $index)

Definition at line 499 of file Collation.php.

References getFirstLetterData().

◆ getUnicodeVersionForICU()

static IcuCollation::getUnicodeVersionForICU ( )
static

Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined.

Since
1.21
Returns
string|false

Definition at line 567 of file Collation.php.

References array(), and getICUVersion().

Referenced by GenerateCollationData\execute().

◆ isCjk()

static IcuCollation::isCjk (   $codepoint)
static

Definition at line 535 of file Collation.php.

References as.

Referenced by GenerateCollationData\charCallback().

Member Data Documentation

◆ $cjkBlocks

IcuCollation::$cjkBlocks
static
Initial value:
array( 0x2E80, 0x2EFF ),
array( 0x2F00, 0x2FDF ),
array( 0x2FF0, 0x2FFF ),
array( 0x3000, 0x303F ),
array( 0x31C0, 0x31EF ),
array( 0x3200, 0x32FF ),
array( 0x3300, 0x33FF ),
array( 0x3400, 0x4DBF ),
array( 0x4E00, 0x9FFF ),
array( 0xF900, 0xFAFF ),
array( 0xFE30, 0xFE4F ),
array( 0x20000, 0x2A6DF ),
array( 0x2A700, 0x2B73F ),
array( 0x2B740, 0x2B81F ),
array( 0x2F800, 0x2FA1F ),
)

Unified CJK blocks.

The same definition of a CJK block must be used for both Collation and generateCollationData.php. These blocks are omitted from the first letter data, as an optimisation measure and because the default UCA table is pretty useless for sorting Chinese text anyway. Japanese and Korean blocks are not included here, because they are smaller and more useful.

Definition at line 168 of file Collation.php.

◆ $digitTransformLanguage

IcuCollation::$digitTransformLanguage

Definition at line 156 of file Collation.php.

◆ $firstLetterData

IcuCollation::$firstLetterData

Definition at line 157 of file Collation.php.

Referenced by getFirstLetterData().

◆ $locale

IcuCollation::$locale

Definition at line 156 of file Collation.php.

Referenced by __construct().

◆ $mainCollator

IcuCollation::$mainCollator

Definition at line 156 of file Collation.php.

◆ $primaryCollator

IcuCollation::$primaryCollator

Definition at line 156 of file Collation.php.

◆ $tailoringFirstLetters

IcuCollation::$tailoringFirstLetters
static

Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-').

These are additions to (or subtractions from) the data stored in the first-letters-root.ser file (which among others includes full basic latin, cyrillic and greek alphabets).

"Separate letter" is a letter that would have a separate heading/section for it in a dictionary or a phone book in this language. This data isn't used for sorting (the ICU library handles that), only for deciding which characters (or character groups) to use as headings.

Initially generated based on the primary level of Unicode collation tailorings available at http://developer.mimer.com/charts/tailorings.htm , later modified.

Empty arrays are intended; this signifies that the data for the language is available and that there are, in fact, no additional letters to consider.

Definition at line 207 of file Collation.php.

◆ FIRST_LETTER_VERSION

const IcuCollation::FIRST_LETTER_VERSION = 2

Definition at line 154 of file Collation.php.

◆ RECORD_LENGTH

const IcuCollation::RECORD_LENGTH = 14

Definition at line 282 of file Collation.php.


The documentation for this class was generated from the following file:
array
the array() calling protocol came about after MediaWiki 1.4rc1.
List of Api Query prop modules.