Inheritance diagram for IcuCollation:

[legend]

Collaboration diagram for IcuCollation:

[legend]

Public Member Functions
	__construct (LanguageFactory $languageFactory, $locale)

	getFirstLetter ( $string)
	Given a string, return the logical "first letter" to be used for grouping on category pages and so on.

	getFirstLetterCount ()

	getFirstLetterData ()

	getLetterByIndex ( $index)

	getPrimarySortKey ( $string)

	getSortKey ( $string)
	Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.

	getSortKeyByLetterIndex ( $index)

Static Public Member Functions
static	getUnicodeVersionForICU ()
	Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined.

static	isCjk ( $codepoint)
	Test if a code point is a CJK (Chinese, Japanese, Korean) character.

Static Public Member Functions inherited from Collation
static	factory ( $collationName)

static	singleton ()

Protected Attributes
Language	$digitTransformLanguage

Private Member Functions
	fetchFirstLetterData ()

	getPrecompiledData ( $name)
	Get an object from the precompiled serialized directory.

Private Attributes
array	$firstLetterData

string	$locale

Collator	$mainCollator

Collator	$primaryCollator

bool	$useNumericCollation = false

const	CJK_BLOCKS
	Unified CJK blocks.

const	FIRST_LETTER_VERSION = 4

const	TAILORING_FIRST_LETTERS
	Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-').

Detailed Description

Since: 1.16.3

Definition at line 26 of file IcuCollation.php.

Constructor & Destructor Documentation

◆ __construct()

IcuCollation::__construct	(	LanguageFactory	$languageFactory,
			$locale
	)

Parameters

LanguageFactory	$languageFactory
string	$locale

Definition at line 248 of file IcuCollation.php.

References $locale, and MediaWiki\Languages\LanguageFactory\getLanguage().

Member Function Documentation

◆ fetchFirstLetterData()

IcuCollation::fetchFirstLetterData ( )

private

Returns: array

Exceptions

MWException

Definition at line 350 of file IcuCollation.php.

References $IP, getPrecompiledData(), getPrimarySortKey(), and wfDebug().

Referenced by getFirstLetterData().

◆ getFirstLetter()

IcuCollation::getFirstLetter ( $string )

Given a string, return the logical "first letter" to be used for grouping on category pages and so on.

This has to be coordinated carefully with convertToSortkey(), or else the sorted list might jump back and forth between the same "initial letters" or other pathological behavior. For instance, if you just return the first character, but "a" sorts the same as "A" based on getSortKey(), then you might get a list like

== A ==

[[Aardvark]]

== a ==

[[antelope]]

== A ==

[[Ape]]

etc., assuming for the sake of argument that $wgCapitalLinks is false.

Since: 1.16.3

Parameters

string $string UTF-8 string

Returns: string UTF-8 string corresponding to the first letter of input

Reimplemented from Collation.

Definition at line 283 of file IcuCollation.php.

References getFirstLetterCount(), getLetterByIndex(), getPrimarySortKey(), and wfMessage().

◆ getFirstLetterCount()

IcuCollation::getFirstLetterCount ( )

Returns: int

Since: 1.16.3

Definition at line 529 of file IcuCollation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getFirstLetterData()

IcuCollation::getFirstLetterData ( )

Since: 1.16.3

Returns: array

Definition at line 328 of file IcuCollation.php.

References $cache, $firstLetterData, CACHE_ANYTHING, and fetchFirstLetterData().

Referenced by getFirstLetterCount(), getLetterByIndex(), and getSortKeyByLetterIndex().

◆ getLetterByIndex()

IcuCollation::getLetterByIndex ( $index )

Parameters

string $index

Returns: string

Since: 1.16.3

Definition at line 512 of file IcuCollation.php.

References getFirstLetterData().

Referenced by getFirstLetter().

◆ getPrecompiledData()

IcuCollation::getPrecompiledData ( $name )

private

Get an object from the precompiled serialized directory.

Replaced use of wfGetPrecompiledData

Parameters

string $name

Returns: mixed The variable on success, false on failure

Definition at line 495 of file IcuCollation.php.

References $blob, $file, $IP, and unserialize().

Referenced by fetchFirstLetterData().

◆ getPrimarySortKey()

IcuCollation::getPrimarySortKey ( $string )

Definition at line 279 of file IcuCollation.php.

Referenced by fetchFirstLetterData(), and getFirstLetter().

◆ getSortKey()

IcuCollation::getSortKey ( $string )

Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.

A binary sort according to the sortkeys corresponds to a logical sort of the corresponding strings. Current code expects that a line feed character should sort before all others, but has no other particular expectations (and that one can be changed if necessary).

Since: 1.16.3

Parameters

string $string UTF-8 string

Returns: string Binary sortkey

Reimplemented from Collation.

Definition at line 275 of file IcuCollation.php.

◆ getSortKeyByLetterIndex()

IcuCollation::getSortKeyByLetterIndex ( $index )

Parameters

string $index

Returns: string

Since: 1.16.3

Definition at line 521 of file IcuCollation.php.

References getFirstLetterData().

◆ getUnicodeVersionForICU()

static IcuCollation::getUnicodeVersionForICU ( )

static

Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined.

Since: 1.21

Returns: string|bool

Definition at line 555 of file IcuCollation.php.

Referenced by GenerateCollationData\execute().

◆ isCjk()

static IcuCollation::isCjk ( $codepoint )

static

Test if a code point is a CJK (Chinese, Japanese, Korean) character.

Parameters

int $codepoint

Returns: bool

Since: 1.16.3

Definition at line 539 of file IcuCollation.php.

Referenced by GenerateCollationData\charCallback().

Member Data Documentation

◆ $digitTransformLanguage

Language IcuCollation::$digitTransformLanguage

protected

Definition at line 39 of file IcuCollation.php.

◆ $firstLetterData

array IcuCollation::$firstLetterData

private

Definition at line 45 of file IcuCollation.php.

Referenced by getFirstLetterData().

◆ $locale

string IcuCollation::$locale

private

Definition at line 36 of file IcuCollation.php.

Referenced by __construct().

◆ $mainCollator

Collator IcuCollation::$mainCollator

private

Definition at line 33 of file IcuCollation.php.

◆ $primaryCollator

Collator IcuCollation::$primaryCollator

private

Definition at line 30 of file IcuCollation.php.

◆ $useNumericCollation

bool IcuCollation::$useNumericCollation = false

private

Definition at line 42 of file IcuCollation.php.

◆ CJK_BLOCKS

const IcuCollation::CJK_BLOCKS

private

Initial value:

= [
        [ 0x2E80, 0x2EFF ], 
        [ 0x2F00, 0x2FDF ], 
        [ 0x2FF0, 0x2FFF ], 
        [ 0x3000, 0x303F ], 
        [ 0x31C0, 0x31EF ], 
        [ 0x3200, 0x32FF ], 
        [ 0x3300, 0x33FF ], 
        [ 0x3400, 0x4DBF ], 
        [ 0x4E00, 0x9FFF ], 
        [ 0xF900, 0xFAFF ], 
        [ 0xFE30, 0xFE4F ], 
        [ 0x20000, 0x2A6DF ], 
        [ 0x2A700, 0x2B73F ], 
        [ 0x2B740, 0x2B81F ], 
        [ 0x2F800, 0x2FA1F ], 
    ]

Unified CJK blocks.

The same definition of a CJK block must be used for both Collation and generateCollationData.php. These blocks are omitted from the first letter data, as an optimisation measure and because the default UCA table is pretty useless for sorting Chinese text anyway. Japanese and Korean blocks are not included here, because they are smaller and more useful.

Definition at line 56 of file IcuCollation.php.

◆ FIRST_LETTER_VERSION

const IcuCollation::FIRST_LETTER_VERSION = 4

private

Definition at line 27 of file IcuCollation.php.

◆ TAILORING_FIRST_LETTERS

const IcuCollation::TAILORING_FIRST_LETTERS

private

Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-').

These are additions to (or subtractions from) the data stored in the first-letters-root.php data file (which among others includes full basic Latin, Cyrillic and Greek alphabets).

"Separate letter" is a letter that would have a separate heading/section for it in a dictionary or a phone book in this language. This data isn't used for sorting (the ICU library handles that), only for deciding which characters (or character groups) to use as headings.

Initially generated based on the primary level of Unicode collation tailorings available at http://developer.mimer.com/charts/tailorings.htm , later modified.

Empty arrays are intended; this signifies that the data for the language is available and that there are, in fact, no additional letters to consider.

Definition at line 95 of file IcuCollation.php.

The documentation for this class was generated from the following file:

includes/collation/IcuCollation.php

Public Member Functions

Static Public Member Functions

Protected Attributes

Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __construct()

Member Function Documentation

◆ fetchFirstLetterData()

◆ getFirstLetter()

◆ getFirstLetterCount()

◆ getFirstLetterData()

◆ getLetterByIndex()

◆ getPrecompiledData()

◆ getPrimarySortKey()

◆ getSortKey()

◆ getSortKeyByLetterIndex()

◆ getUnicodeVersionForICU()

◆ isCjk()

Member Data Documentation

◆ $digitTransformLanguage

◆ $firstLetterData

◆ $locale

◆ $mainCollator

◆ $primaryCollator

◆ $useNumericCollation

◆ CJK_BLOCKS

◆ FIRST_LETTER_VERSION

◆ TAILORING_FIRST_LETTERS