MediaWiki  master
UtfNormal Class Reference

Unicode normalization routines for working with UTF-8 strings. More...

Static Public Member Functions

static cleanUp ($string)
 The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition. More...
 
static quickIsNFC ($string)
 Returns true if the string is definitely in NFC. More...
 
static quickIsNFCVerify (&$string)
 Returns true if the string is definitely in NFC. More...
 
static toNFC ($string)
 Convert a UTF-8 string to normal form C, canonical composition. More...
 
static toNFD ($string)
 Convert a UTF-8 string to normal form D, canonical decomposition. More...
 
static toNFKC ($string)
 Convert a UTF-8 string to normal form KC, compatibility composition. More...
 
static toNFKD ($string)
 Convert a UTF-8 string to normal form KD, compatibility decomposition. More...
 

Detailed Description

Unicode normalization routines for working with UTF-8 strings.

Currently assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly determine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

Deprecated:
since 1.25, use UtfNormal\Validator directly

Definition at line 48 of file UtfNormal.php.

Member Function Documentation

static UtfNormal::cleanUp (   $string)
static

The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters. Not as fast as toNFC().

Parameters
string$stringa UTF-8 string
Returns
string a clean, shiny, normalized UTF-8 string

Definition at line 59 of file UtfNormal.php.

static UtfNormal::quickIsNFC (   $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
bool

Definition at line 116 of file UtfNormal.php.

static UtfNormal::quickIsNFCVerify ( $string)
static

Returns true if the string is definitely in NFC.

Returns false if not or uncertain.

Parameters
string$stringa UTF-8 string, altered on output to be valid UTF-8 safe for XML.
Returns
bool

Definition at line 126 of file UtfNormal.php.

static UtfNormal::toNFC (   $string)
static

Convert a UTF-8 string to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form C

Definition at line 71 of file UtfNormal.php.

static UtfNormal::toNFD (   $string)
static

Convert a UTF-8 string to normal form D, canonical decomposition.

Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form D

Definition at line 82 of file UtfNormal.php.

static UtfNormal::toNFKC (   $string)
static

Convert a UTF-8 string to normal form KC, compatibility composition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form KC

Definition at line 94 of file UtfNormal.php.

static UtfNormal::toNFKD (   $string)
static

Convert a UTF-8 string to normal form KD, compatibility decomposition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters
string$stringa valid UTF-8 string. Input is not validated.
Returns
string a UTF-8 string in normal form KD

Definition at line 106 of file UtfNormal.php.


The documentation for this class was generated from the following file: