MediaWiki  1.29.2
CollationEt Class Reference

Workaround for incorrect collation of Estonian language ('et') in ICU (T56168). More...

Inheritance diagram for CollationEt:
Collaboration diagram for CollationEt:

Public Member Functions

 __construct ()
 
 getFirstLetter ( $string)
 Given a string, return the logical "first letter" to be used for grouping on category pages and so on. More...
 
 getSortKey ( $string)
 Given a string, convert it to a (hopefully short) key that can be used for efficient sorting. More...
 
- Public Member Functions inherited from IcuCollation
 __construct ( $locale)
 
 getFirstLetterCount ()
 
 getFirstLetterData ()
 
 getLetterByIndex ( $index)
 
 getPrimarySortKey ( $string)
 
 getSortKeyByLetterIndex ( $index)
 

Static Private Member Functions

static mangle ( $string)
 
static unmangle ( $string)
 

Additional Inherited Members

- Static Public Member Functions inherited from IcuCollation
static getICUVersion ()
 Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined. More...
 
static getUnicodeVersionForICU ()
 Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined. More...
 
static isCjk ( $codepoint)
 Test if a code point is a CJK (Chinese, Japanese, Korean) character. More...
 
- Static Public Member Functions inherited from Collation
static factory ( $collationName)
 
static singleton ()
 
- Public Attributes inherited from IcuCollation
const FIRST_LETTER_VERSION = 2
 
const RECORD_LENGTH = 14
 
- Protected Attributes inherited from IcuCollation
Language $digitTransformLanguage
 

Detailed Description

Workaround for incorrect collation of Estonian language ('et') in ICU (T56168).

'W' and 'V' should not be considered the same letter for the purposes of collation in modern Estonian. We work around this by replacing 'W' and 'w' with 'ᴡ' U+1D21 'LATIN LETTER SMALL CAPITAL W' for sortkey generation, which is collated like 'W' and is not tailored to have the same primary weight as 'V' in Estonian.

Since
1.24

Definition at line 31 of file CollationEt.php.

Constructor & Destructor Documentation

◆ __construct()

CollationEt::__construct ( )

Definition at line 32 of file CollationEt.php.

Member Function Documentation

◆ getFirstLetter()

CollationEt::getFirstLetter (   $string)

Given a string, return the logical "first letter" to be used for grouping on category pages and so on.

This has to be coordinated carefully with convertToSortkey(), or else the sorted list might jump back and forth between the same "initial letters" or other pathological behavior. For instance, if you just return the first character, but "a" sorts the same as "A" based on getSortKey(), then you might get a list like

== A ==

  • [[Aardvark]]

== a ==

  • [[antelope]]

== A ==

  • [[Ape]]

etc., assuming for the sake of argument that $wgCapitalLinks is false.

Since
1.16.3
Parameters
string$stringUTF-8 string
Returns
string UTF-8 string corresponding to the first letter of input

Reimplemented from IcuCollation.

Definition at line 57 of file CollationEt.php.

References unmangle().

◆ getSortKey()

CollationEt::getSortKey (   $string)

Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.

A binary sort according to the sortkeys corresponds to a logical sort of the corresponding strings. Current code expects that a line feed character should sort before all others, but has no other particular expectations (and that one can be changed if necessary).

Since
1.16.3
Parameters
string$stringUTF-8 string
Returns
string Binary sortkey

Reimplemented from IcuCollation.

Definition at line 53 of file CollationEt.php.

◆ mangle()

static CollationEt::mangle (   $string)
staticprivate

Definition at line 36 of file CollationEt.php.

◆ unmangle()

static CollationEt::unmangle (   $string)
staticprivate

Definition at line 44 of file CollationEt.php.

Referenced by getFirstLetter().


The documentation for this class was generated from the following file: