[ Index ]

PHP Cross Reference of MediaWiki-1.24.0

title

Body

[close]

/includes/normal/ -> UtfNormal.php (summary)

Unicode normalization routines Copyright © 2004 Brion Vibber <[email protected]> https://www.mediawiki.org/

File Size: 790 lines (23 kb)
Included or required: 6 times
Referenced: 0 times
Includes or requires: 2 files
 includes/normal/UtfNormalData.inc
 includes/normal/UtfNormalDataK.inc

Defines 1 class

UtfNormal:: (17 methods):
  cleanUp()
  toNFC()
  toNFD()
  toNFKC()
  toNFKD()
  loadData()
  quickIsNFC()
  quickIsNFCVerify()
  NFC()
  NFD()
  NFKC()
  NFKD()
  fastDecompose()
  fastCombiningSort()
  fastCompose()
  placebo()
  replaceForNativeNormalize()


Class: UtfNormal  - X-Ref

Unicode normalization routines for working with UTF-8 strings.
Currently assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes.
UtfNormal::toNFC() will bail early if given ASCII text or text
it can quickly determine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

cleanUp( $string )   X-Ref
The ultimate convenience function! Clean up invalid UTF-8 sequences,
and convert to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for
strings containing only known-good characters. Not as fast as toNFC().

param: string $string a UTF-8 string
return: string a clean, shiny, normalized UTF-8 string

toNFC( $string )   X-Ref
Convert a UTF-8 string to normal form C, canonical composition.
Fast return for pure ASCII strings; some lesser optimizations for
strings containing only known-good characters.

param: string $string a valid UTF-8 string. Input is not validated.
return: string a UTF-8 string in normal form C

toNFD( $string )   X-Ref
Convert a UTF-8 string to normal form D, canonical decomposition.
Fast return for pure ASCII strings.

param: string $string a valid UTF-8 string. Input is not validated.
return: string a UTF-8 string in normal form D

toNFKC( $string )   X-Ref
Convert a UTF-8 string to normal form KC, compatibility composition.
This may cause irreversible information loss, use judiciously.
Fast return for pure ASCII strings.

param: string $string a valid UTF-8 string. Input is not validated.
return: string a UTF-8 string in normal form KC

toNFKD( $string )   X-Ref
Convert a UTF-8 string to normal form KD, compatibility decomposition.
This may cause irreversible information loss, use judiciously.
Fast return for pure ASCII strings.

param: string $string a valid UTF-8 string. Input is not validated.
return: string a UTF-8 string in normal form KD

loadData()   X-Ref
Load the basic composition data if necessary


quickIsNFC( $string )   X-Ref
Returns true if the string is _definitely_ in NFC.
Returns false if not or uncertain.

param: string $string a valid UTF-8 string. Input is not validated.
return: bool

quickIsNFCVerify( &$string )   X-Ref
Returns true if the string is _definitely_ in NFC.
Returns false if not or uncertain.

param: string $string a UTF-8 string, altered on output to be valid UTF-8 safe for XML.
return: bool

NFC( $string )   X-Ref

param: $string string
return: string

NFD( $string )   X-Ref

param: $string string
return: string

NFKC( $string )   X-Ref

param: $string string
return: string

NFKD( $string )   X-Ref

param: $string string
return: string

fastDecompose( $string, $map )   X-Ref
Perform decomposition of a UTF-8 string into either D or KD form
(depending on which decomposition map is passed to us).
Input is assumed to be *valid* UTF-8. Invalid code will break.

param: string $string valid UTF-8 string
param: array $map hash of expanded decomposition map
return: string a UTF-8 string decomposed, not yet normalized (needs sorting)

fastCombiningSort( $string )   X-Ref
Sorts combining characters into canonical order. This is the
final step in creating decomposed normal forms D and KD.

param: string $string a valid, decomposed UTF-8 string. Input is not validated.
return: string a UTF-8 string with combining characters sorted in canonical order

fastCompose( $string )   X-Ref
Produces canonically composed sequences, i.e. normal form C or KC.

param: string $string a valid UTF-8 string in sorted normal form D or KD.
return: string a UTF-8 string with canonical precomposed characters used

placebo( $string )   X-Ref
This is just used for the benchmark, comparing how long it takes to
interate through a string without really doing anything of substance.

param: $string string
return: string

replaceForNativeNormalize( $string )   X-Ref
Function to replace some characters that we don't want
but most of the native normalize functions keep.

param: string $string The string
return: String String with the character codes replaced.



Generated: Fri Nov 28 14:03:12 2014 Cross-referenced by PHPXref 0.7.1