MediaWiki
REL1_24
|
Public Member Functions | |
__construct ($locale) | |
getFirstLetter ($string) | |
Given a string, return the logical "first letter" to be used for grouping on category pages and so on. | |
getFirstLetterCount () | |
getFirstLetterData () | |
getLetterByIndex ($index) | |
getPrimarySortKey ($string) | |
getSortKey ($string) | |
Given a string, convert it to a (hopefully short) key that can be used for efficient sorting. | |
getSortKeyByLetterIndex ($index) | |
Static Public Member Functions | |
static | getICUVersion () |
Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined. | |
static | getUnicodeVersionForICU () |
Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined. | |
static | isCjk ($codepoint) |
Public Attributes | |
const | FIRST_LETTER_VERSION = 2 |
const | RECORD_LENGTH = 14 |
Protected Attributes | |
Language | $digitTransformLanguage |
* | |
Private Attributes | |
array | $firstLetterData |
* | |
string | $locale |
* | |
Collator | $mainCollator |
* | |
Collator | $primaryCollator |
* | |
Static Private Attributes | |
static | $cjkBlocks |
Unified CJK blocks. | |
static | $tailoringFirstLetters |
Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-'). |
Definition at line 156 of file Collation.php.
IcuCollation::__construct | ( | $ | locale | ) |
Definition at line 294 of file Collation.php.
IcuCollation::getFirstLetter | ( | $ | string | ) |
Given a string, return the logical "first letter" to be used for grouping on category pages and so on.
This has to be coordinated carefully with convertToSortkey(), or else the sorted list might jump back and forth between the same "initial letters" or other pathological behavior. For instance, if you just return the first character, but "a" sorts the same as "A" based on getSortKey(), then you might get a list like
== A == * [[Aardvark]]
== a == * [[antelope]]
== A == * [[Ape]]
etc., assuming for the sake of argument that $wgCapitalLinks is false.
string | $string | UTF-8 string |
Reimplemented from Collation.
Reimplemented in CollationEt.
Definition at line 331 of file Collation.php.
Definition at line 521 of file Collation.php.
Definition at line 359 of file Collation.php.
static IcuCollation::getICUVersion | ( | ) | [static] |
Return the version of ICU library used by PHP's intl extension, or false when the extension is not installed of the version can't be determined.
The constant INTL_ICU_VERSION this function refers to isn't really documented. It is available since PHP 5.3.7 (see PHP bug 54561). This function will return false on older PHPs.
Definition at line 549 of file Collation.php.
Referenced by GenerateCollationData\execute().
IcuCollation::getLetterByIndex | ( | $ | index | ) |
Definition at line 507 of file Collation.php.
IcuCollation::getPrimarySortKey | ( | $ | string | ) |
Definition at line 324 of file Collation.php.
IcuCollation::getSortKey | ( | $ | string | ) |
Given a string, convert it to a (hopefully short) key that can be used for efficient sorting.
A binary sort according to the sortkeys corresponds to a logical sort of the corresponding strings. Current code expects that a line feed character should sort before all others, but has no other particular expectations (and that one can be changed if necessary).
string | $string | UTF-8 string |
Reimplemented from Collation.
Reimplemented in CollationEt.
Definition at line 314 of file Collation.php.
IcuCollation::getSortKeyByLetterIndex | ( | $ | index | ) |
Definition at line 514 of file Collation.php.
static IcuCollation::getUnicodeVersionForICU | ( | ) | [static] |
Return the version of Unicode appropriate for the version of ICU library currently in use, or false when it can't be determined.
Definition at line 560 of file Collation.php.
Referenced by GenerateCollationData\execute().
static IcuCollation::isCjk | ( | $ | codepoint | ) | [static] |
Definition at line 528 of file Collation.php.
Referenced by GenerateCollationData\charCallback().
IcuCollation::$cjkBlocks [static, private] |
array( array( 0x2E80, 0x2EFF ), array( 0x2F00, 0x2FDF ), array( 0x2FF0, 0x2FFF ), array( 0x3000, 0x303F ), array( 0x31C0, 0x31EF ), array( 0x3200, 0x32FF ), array( 0x3300, 0x33FF ), array( 0x3400, 0x4DBF ), array( 0x4E00, 0x9FFF ), array( 0xF900, 0xFAFF ), array( 0xFE30, 0xFE4F ), array( 0x20000, 0x2A6DF ), array( 0x2A700, 0x2B73F ), array( 0x2B740, 0x2B81F ), array( 0x2F800, 0x2FA1F ), )
Unified CJK blocks.
The same definition of a CJK block must be used for both Collation and generateCollationData.php. These blocks are omitted from the first letter data, as an optimisation measure and because the default UCA table is pretty useless for sorting Chinese text anyway. Japanese and Korean blocks are not included here, because they are smaller and more useful.
Definition at line 178 of file Collation.php.
Language IcuCollation::$digitTransformLanguage [protected] |
*
Definition at line 165 of file Collation.php.
array IcuCollation::$firstLetterData [private] |
*
Definition at line 167 of file Collation.php.
string IcuCollation::$locale [private] |
*
Definition at line 163 of file Collation.php.
Collator IcuCollation::$mainCollator [private] |
*
Definition at line 161 of file Collation.php.
Collator IcuCollation::$primaryCollator [private] |
*
Definition at line 159 of file Collation.php.
IcuCollation::$tailoringFirstLetters [static, private] |
Additional characters (or character groups) to be considered separate letters for given languages, or to be removed from the list of such letters (denoted by keys starting with '-').
These are additions to (or subtractions from) the data stored in the first-letters-root.ser file (which among others includes full basic latin, cyrillic and greek alphabets).
"Separate letter" is a letter that would have a separate heading/section for it in a dictionary or a phone book in this language. This data isn't used for sorting (the ICU library handles that), only for deciding which characters (or character groups) to use as headings.
Initially generated based on the primary level of Unicode collation tailorings available at http://developer.mimer.com/charts/tailorings.htm , later modified.
Empty arrays are intended; this signifies that the data for the language is available and that there are, in fact, no additional letters to consider.
Definition at line 217 of file Collation.php.
const IcuCollation::FIRST_LETTER_VERSION = 2 |
Definition at line 157 of file Collation.php.
const IcuCollation::RECORD_LENGTH = 14 |
Definition at line 292 of file Collation.php.