Symbian
Symbian OS Library

SYMBIAN OS V9.3

[Index] [Spacer] [Previous] [Next]



Location: e32cmn.h
Link against: euser.lib

Class TChar

class TChar;

Description

Holds a character value and provides a number of utility functions to manipulate it and test its properties.

For example, there are functions to convert the character to uppercase and test whether or not it is a control character.

The character value is stored as a 32-bit unsigned integer. The shorthand "TChar value" is used to describe the character value wrapped by a TChar object.

TChar can be used to represent Unicode values outside plane 0 (that is, the extended Unicode range from 0x10000 to 0xFFFFF). This differentiates it from TText which can only be used for 16-bit Unicode character values.

Members

Defined in TChar:
Compose(), Decompose(), EAlphaGroup, EArabicNumber, EBoundaryNeutral, ECcCategory, ECfCategory, ECnCategory, ECoCategory, ECommonNumberSeparator, EControlGroup, ECsCategory, EEuropeanNumber, EEuropeanNumberSeparator, EEuropeanNumberTerminator, EFoldAccents, EFoldAll, EFoldCase, EFoldDigits, EFoldKana, EFoldSpaces, EFoldStandard, EFoldWidth, EFullWidth, EHalfWidth, ELeftToRight, ELeftToRightEmbedding, ELeftToRightOverride, ELetterModifierGroup, ELetterOtherGroup, ELlCategory, ELmCategory, ELoCategory, ELtCategory, ELuCategory, EMarkGroup, EMaxAssignedCategory, EMaxAssignedGroup, EMaxGraphicCategory, EMaxLetterCategory, EMaxLetterOrLetterModifierCategory, EMaxPrintableCategory, EMcCategory, EMeCategory, EMnCategory, ENarrow, ENdCategory, ENeutralWidth, ENlCategory, ENoCategory, ENonSpacingMark, ENumberGroup, EOtherNeutral, EParagraphSeparator, EPcCategory, EPdCategory, EPeCategory, EPfCategory, EPiCategory, EPoCategory, EPopDirectionalFormat, EPsCategory, EPunctuationGroup, ERightToLeft, ERightToLeftArabic, ERightToLeftEmbedding, ERightToLeftOverride, EScCategory, ESegmentSeparator, ESeparatorGroup, EShiftJIS, ESkCategory, ESmCategory, ESoCategory, ESymbolGroup, EUnassignedGroup, EUnicode, EWhitespace, EWide, EZlCategory, EZpCategory, EZsCategory, Eos(), Fold(), Fold(), GetBdCategory(), GetCategory(), GetCjkWidth(), GetCombiningClass(), GetInfo(), GetLowerCase(), GetNumericValue(), GetTitleCase(), GetUpperCase(), IsAlpha(), IsAlphaDigit(), IsAssigned(), IsControl(), IsDigit(), IsGraph(), IsHexDigit(), IsLower(), IsMirrored(), IsPrint(), IsPunctuation(), IsSpace(), IsTitle(), IsUpper(), LowerCase(), SetChar(), TBdCategory, TCategory, TChar(), TChar(), TCharInfo, TCjkWidth, TEncoding, TitleCase(), UpperCase(), anonymous, operator TUint(), operator+(), operator+=(), operator-(), operator-=()

See also:


Construction and destruction


TChar()

inline TChar();

Description

Default constructor.

Constructs this character object with an undefined value.


TChar()

inline TChar(TUint aChar);

Description

Constructs this character object and initialises it with the specified value.

Parameters

TUint aChar

The initialisation value.

[Top]


Member functions


operator-=()

inline TChar &operator-=(TUint aChar);

Description

Subtracts an unsigned integer value from this character object.

This character object is changed by the operation.

Parameters

TUint aChar

The value to be subtracted.

Return value

TChar &

A reference to this character object.


operator+=()

inline TChar &operator+=(TUint aChar);

Description

Adds an unsigned integer value to this character object.

This character object is changed by the operation.

Parameters

TUint aChar

The value to be added.

Return value

TChar &

A reference to this character object.


operator-()

inline TChar operator-(TUint aChar);

Description

Gets the result of subtracting an unsigned integer value from this character object.

This character object is not changed.

Parameters

TUint aChar

The value to be subtracted.

Return value

TChar

A character object whose value is the result of the subtraction operation.


operator+()

inline TChar operator+(TUint aChar);

Description

Gets the result of adding an unsigned integer value to this character object.

This character object is not changed.

Parameters

TUint aChar

The value to be added.

Return value

TChar

A character object whose value is the result of the addition operation.


operator TUint()

inline operator TUint() const;

Description

Gets the value of the character as an unsigned integer.

The operator casts a TChar to a TUint, returning the TUint value wrapped by this character object.

Return value


Fold()

inline void Fold();

Description

Converts the character to a form which can be used in tolerant comparisons without control over the operations performed.

Tolerant comparisons are those which ignore character differences like case and accents.

This function can be used when searching for a string in a text file or a file in a directory. Folding performs the following conversions: converts to lowercase, strips accents, converts all digits representing the values 0..9 to the ordinary digit characters '0'..'9', converts all spaces (standard, non-break, fixed-width, ideographic, etc.) to the ordinary space character (0x0020), converts Japanese characters in the hiragana syllabary to katakana, and converts East Asian halfwidth and fullwidth variants to their ordinary forms. You can choose to perform any subset of these operations by using the other function overload.


LowerCase()

inline void LowerCase();

Description

Converts the character to its lowercase form.

Characters lacking a lowercase form are unchanged.


UpperCase()

inline void UpperCase();

Description

Converts the character to its uppercase form.

Characters lacking an uppercase form are unchanged.


Eos()

inline TBool Eos() const;

Description

Tests whether the character is the C/C++ end-of-string character - 0.

Return value

TBool

True, if the character is 0; false, otherwise.


GetUpperCase()

IMPORT_C TUint GetUpperCase() const;

Description

Gets the character value after conversion to uppercase or the character's own value, if no uppercase form exists.

The character object itself is not changed.

Return value

TUint

The character value after conversion to uppercase.


GetLowerCase()

IMPORT_C TUint GetLowerCase() const;

Description

Gets the character value after conversion to lowercase or the character's own value, if no lowercase form exists.

The character object itself is not changed.

Return value

TUint

The character value after conversion to lowercase.


IsLower()

IMPORT_C TBool IsLower() const;

Description

Tests whether the character is lowercase.

Return value

TBool

True, if the character is lowercase; false, otherwise.


IsUpper()

IMPORT_C TBool IsUpper() const;

Description

Tests whether the character is uppercase.

Return value

TBool

True, if the character is uppercase; false, otherwise.


IsAlpha()

IMPORT_C TBool IsAlpha() const;

Description

Tests whether the character is alphabetic.

For Unicode, the function returns TRUE for all letters, including those from syllabaries and ideographic scripts. The function returns FALSE for letter-like characters that are in fact diacritics. Specifically, the function returns TRUE for categories: ELuCategory, ELtCategory, ELlCategory, and ELoCategory; it returns FALSE for all other categories including ELmCategory.

Return value

TBool

True, if the character is alphabetic; false, otherwise.

See also:


IsDigit()

IMPORT_C TBool IsDigit() const;

Description

Tests whether the character is a standard decimal digit.

For Unicode, this function returns TRUE only for the digits '0'...'9' (U+0030...U+0039), not for other digits in scripts like Arabic, Tamil, etc.

Return value

TBool

True, if the character is a standard decimal digit; false, otherwise.

See also:


IsAlphaDigit()

IMPORT_C TBool IsAlphaDigit() const;

Description

Tests whether the character is alphabetic or a decimal digit.

It is identical to (IsAlpha()||IsDigit()).

Return value

TBool

True, if the character is alphabetic or a decimal digit; false, otherwise.

See also:


IsHexDigit()

IMPORT_C TBool IsHexDigit() const;

Description

Tests whether the character is a hexadecimal digit (0-9, a-f, A-F).

Return value

TBool

True, if the character is a hexadecimal digit; false, otherwise.


IsSpace()

IMPORT_C TBool IsSpace() const;

Description

Tests whether the character is a white space character.

White space includes spaces, tabs and separators.

For Unicode, the function returns TRUE for all characters in the categories: EZsCategory, EZlCategory and EZpCategory, and also for the characters 0x0009 (horizontal tab), 0x000A (linefeed), 0x000B (vertical tab), 0x000C (form feed), and 0x000D (carriage return).

Return value

TBool

True, if the character is white space; false, otherwise.

See also:


IsPunctuation()

IMPORT_C TBool IsPunctuation() const;

Description

Tests whether the character is a punctuation character.

For Unicode, punctuation characters are any character in the categories: EPcCategory, EPdCategory, EPsCategory, EPeCategory, EPiCategory, EPfCategory, EPoCategory.

Return value

TBool

True, if the character is punctuation; false, otherwise.

See also:


IsGraph()

IMPORT_C TBool IsGraph() const;

Description

Tests whether the character is a graphic character.

For Unicode, graphic characters include printable characters but not the space character. Specifically, graphic characters are any character except those in categories: EZsCategory,EZlCategory,EZpCategory, ECcCategory,ECfCategory, ECsCategory, ECoCategory, and ,ECnCategory.

Note that for ISO Latin-1, all alphanumeric and punctuation characters are graphic.

Return value

TBool

True, if the character is a graphic character; false, otherwise.

See also:


IsPrint()

IMPORT_C TBool IsPrint() const;

Description

Tests whether the character is a printable character.

For Unicode, printable characters are any character except those in categories: ECcCategory, ECfCategory, ECsCategory, ECoCategory and ECnCategory.

Note that for ISO Latin-1, all alphanumeric and punctuation characters, plus space, are printable.

Return value

TBool

True, if the character is printable; false, otherwise.

See also:


IsControl()

IMPORT_C TBool IsControl() const;

Description

Tests whether the character is a control character.

For Unicode, the function returns TRUE for all characters in the categories: ECcCategory, ECfCategory, ECsCategory, ECoCategory and ECnCategoryCc.

Return value

TBool

True, if the character is a control character; false, otherwise.

See also:


Fold()

inline void Fold(TInt aFlags);

Description

Converts the character to a form which can be used in tolerant comparisons allowing selection of the specific fold operations to be performed.

Parameters

TInt aFlags

Flags which define the operations to be performed. The values are defined in the enum beginning with EFoldCase.

See also:


TitleCase()

inline void TitleCase();

Description

Converts the character to its titlecase form.

The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists. Characters lacking a titlecase form are unchanged.


GetTitleCase()

IMPORT_C TUint GetTitleCase() const;

Description

Gets the character value after conversion to titlecase or the character's own value, if no titlecase form exists.

The titlecase form of a character is identical to its uppercase form unless a specific titlecase form exists.

Return value

TUint

The value of the character value after conversion to titlecase form.


IsTitle()

IMPORT_C TBool IsTitle() const;

Description

Tests whether this character is in titlecase.

Return value

TBool

True, if this character is in titlecase; false, otherwise.


IsAssigned()

IMPORT_C TBool IsAssigned() const;

Description

Tests whether this character has an assigned meaning in the Unicode encoding.

All characters outside the range 0x0000 - 0xFFFF are unassigned and there are also many unassigned characters within the Unicode range.

Locales can change the assigned/unassigned status of characters. This means that the precise behaviour of this function is locale-dependent.

Return value

TBool

True, if this character has an assigned meaning; false, otherwise.


GetInfo()

IMPORT_C void GetInfo(TCharInfo &aInfo) const;

Description

Gets this character;s standard category information.

This includes everything except its CJK width and decomposition, if any.

Parameters

TCharInfo &aInfo

On return, contains the character's standard category information.


GetCategory()

IMPORT_C TCategory GetCategory() const;

Description

Gets this character's Unicode category.

Return value

TCategory

This character's Unicode category.


GetBdCategory()

IMPORT_C TBdCategory GetBdCategory() const;

Description

Gets the bi-directional category of a character.

For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9/.

Return value

TBdCategory

The character's bi-directional category.


GetCombiningClass()

IMPORT_C TInt GetCombiningClass() const;

Description

Gets this character's combining class.

Note that diacritics and other combining characters have non-zero combining classes.

Return value

TInt

The combining class.


IsMirrored()

IMPORT_C TBool IsMirrored() const;

Description

Tests whether this character has the mirrored property.

Mirrored characters, like ( ) [ ] < >, change direction according to the directionality of the surrounding characters. For example, an opening parenthesis 'faces right' in Hebrew or Arabic, and to say that 2 < 3 you would have to say that 3 > 2, where the '>' is, in this example, a less-than sign to be read right-to-left.

Return value

TBool

True, if this character has the mirrored property; false, otherwise.


GetNumericValue()

IMPORT_C TInt GetNumericValue() const;

Description

Gets the integer numeric value of this character.

Numeric values need not be in the range 0..9; the Unicode character set includes various other numeric characters such as the Roman and Tamil numerals for 500, 1000, etc.

Return value

TInt

The numeric value: -1 if the character has no integer numeric value,-2 if the character has a fractional numeric value.


GetCjkWidth()

IMPORT_C TCjkWidth GetCjkWidth() const;

Description

Gets the Chinese, Japanese, Korean (CJK) notional width.

Some display systems used in East Asia display characters on a grid of fixed-width character cells like the standard MSDOS display mode.

Some characters, e.g. the Japanese katakana syllabary, take up a single character cell and some characters, e.g., kanji, Chinese characters used in Japanese, take up two. These are called half-width and full-width characters. This property is fixed and cannot be overridden for particular locales.

For more information on returned widths, see Unicode Technical Report 11 on East Asian Width available at: http://www.unicode.org/unicode/reports/tr11/

Return value

TCjkWidth

The notional width of an east Asian character.


Compose()

static IMPORT_C TBool Compose(TUint &aResult, const TDesC16 &aSource);

Description

Composes a string of Unicode characters to produce a single character result.

For example, 0061 ('a') and 030A (combining ring above) compose to give 00E5 ('a' with ring above).

A canonical decomposition is a relationship between a string of characters - usually a base character and one or more diacritics - and a composed character. The Unicode standard requires that compliant software treats composed characters identically with their canonical decompositions. The mappings used by these functions are fixed and cannot be overridden for particular locales.

Parameters

TUint &aResult

If successful, the composed character value. If unsuccessful, this value contains 0xFFFF.

const TDesC16 &aSource

String of source Unicode characters.

Return value

TBool

True, if the compose operation is successful in combining the entire sequence of characters in the descriptor into a single compound character; false, otherwise.


Decompose()

IMPORT_C TBool Decompose(TPtrC16 &aResult) const;

Description

Maps this character to its maximal canonical decomposition.

For example, 01E1 ('a' with dot above and macron) decomposes into 0061 ('a') 0307 (dot) and 0304 (macron).

Note that this function is used during collation, as performed by the Mem::CompareC() function, to convert the compared strings to their maximal canonical decompositions.

Parameters

TPtrC16 &aResult

If successful, the descriptor represents the canonical decomposition of this character. If unsuccessful, the descriptor is empty.

Return value

TBool

True if decomposition is successful; false, otherwise.

See also:


SetChar()

protected: inline void SetChar(TUint aChar);

Description

Parameters

TUint aChar

[Top]


Member structures


Struct TCharInfo

struct TCharInfo;

Description

A structure to hold information about a Unicode character.

An object of this type is passed to TChar::GetInfo().

Members

Defined in TChar::TCharInfo:
iBdCategory, iCategory, iCombiningClass, iLowerCase, iMirrored, iNumericValue, iTitleCase, iUpperCase

Member data


iCategory

TCategory iCategory;

Description

General category.


iBdCategory

TBdCategory iBdCategory;

Description

Bi-directional category.


iCombiningClass

TInt iCombiningClass;

Description

Combining class: number (currently) in the range 0..234


iLowerCase

TUint iLowerCase;

Description

Lower case form.


iUpperCase

TUint iUpperCase;

Description

Upper case form.


iTitleCase

TUint iTitleCase;

Description

Title case form.


iMirrored

TBool iMirrored;

Description

True, if the character is mirrored.


iNumericValue

TInt iNumericValue;

Description

Integer numeric value: -1 if none, -2 if a fraction.

[Top]


Member enumerations


Enum TCategory

TCategory

Description

General Unicode character category.

The high nibble encodes the major category (Mark, Number, etc.) and a low nibble encodes the subdivisions of that category.

The category codes can be used in three ways:

(i) as unique constants: there is one for each Unicode category, with a name of the form

    E<XX>Category

where

    <XX>

is the category name given by the Unicode database (e.g., the constant ELuCategory is used for lowercase letters, category Lu);

(ii) as numbers in certain ranges: letter categories are all <= EMaxLetterCategory;

(iii) as codes in which the upper nibble gives the category group (e.g., punctuation categories all yield TRUE for the test (category & 0xF0) ==EPunctuationGroup).

EAlphaGroup

Alphabetic letters.

Includes ELuCategory, ELlCategory and ELtCategory.

ELetterOtherGroup

Other letters.

Includes ELoCategory.

ELetterModifierGroup

Letter modifiers.

Includes ELmCategory.

EMarkGroup

Marks group.

Includes EMnCategory, EMcCategory and EMeCategory.

ENumberGroup

Numbers group.

Includes ENdCategory, ENlCategory and ENoCategory.

EPunctuationGroup

Punctuation group.

IncludesEPcCategory, PdCategory, EpeCategory, EPsCategory and EPoCategory.

ESymbolGroup

Symbols group.

Includes ESmCategory, EScCategory, ESkCategory and ESoCategory.

ESeparatorGroup

Separators group.

Includes EZsCategory, EZlCategory and EZlpCategory.

EControlGroup

Control, format, private use, unassigned.

Includes ECcCategory, ECtCategory, ECsCategory, ECoCategory and ECnCategory.

EMaxAssignedGroup

The highest possible groups category.

EUnassignedGroup

Unassigned to any other group.

ELuCategory

Letter, Uppercase.

ELlCategory

Letter, Lowercase.

ELtCategory

Letter, Titlecase.

ELoCategory

Letter, Other.

EMaxLetterCategory

The highest possible (non-modifier) letter category.

ELmCategory

Letter, Modifier.

EMaxLetterOrLetterModifierCategory

The highest possible letter category.

EMnCategory

Mark, Non-Spacing

EMcCategory

Mark, Combining.

EMeCategory

Mark, Enclosing.

ENdCategory

Number, Decimal Digit.

ENlCategory

Number, Letter.

ENoCategory

Number, Other.

EPcCategory

Punctuation, Connector.

EPdCategory

Punctuation, Dash.

EPsCategory

Punctuation, Open.

EPeCategory

Punctuation, Close.

EPiCategory

Punctuation, Initial Quote

EPfCategory

Punctuation, Final Quote

EPoCategory

Punctuation, Other.

ESmCategory

Symbol, Math.

EScCategory

Symbol, Currency.

ESkCategory

Symbol, Modifier.

ESoCategory

Symbol, Other.

EMaxGraphicCategory

The highest possible graphic character category.

EZsCategory

Separator, Space.

EMaxPrintableCategory

The highest possible printable character category.

EZlCategory

Separator, Line.

EZpCategory

Separator, Paragraph.

ECcCategory

Other, Control.

ECfCategory

Other, Format.

EMaxAssignedCategory

The highest possible category for assigned 16-bit characters; does not include surrogates, which are interpreted as pairs and have no meaning on their own.

ECsCategory

Other, Surrogate.

ECoCategory

Other, Private Use.

ECnCategory

Other, Not Assigned.


Enum TBdCategory

TBdCategory

Description

The bi-directional Unicode character category.

For more information on the bi-directional algorithm, see Unicode Technical Report No. 9 available at: http://www.unicode.org/unicode/reports/tr9.

ELeftToRight

Left to right.

ELeftToRightEmbedding

Left to right embedding.

ELeftToRightOverride

Left-to-Right Override.

ERightToLeft

Right to left.

ERightToLeftArabic

Right to left Arabic.

ERightToLeftEmbedding

Right to left embedding.

ERightToLeftOverride

Right-to-Left Override.

EPopDirectionalFormat

Pop Directional Format.

EEuropeanNumber

European number.

EEuropeanNumberSeparator

European number separator.

EEuropeanNumberTerminator

European number terminator.

EArabicNumber

Arabic number.

ECommonNumberSeparator

Common number separator.

ENonSpacingMark

Non Spacing Mark.

EBoundaryNeutral

Boundary Neutral.

EParagraphSeparator

Paragraph Separator.

ESegmentSeparator

Segment separator.

EWhitespace

Whitespace

EOtherNeutral

Other neutrals; all other characters: punctuation, symbols.


Enum TCjkWidth

TCjkWidth

Description

Notional character width as known to East Asian (Chinese, Japanese, Korean (CJK)) coding systems.

ENeutralWidth

Includes 'ambiguous width' defined in Unicode Technical Report 11: East Asian Width

EHalfWidth

Character which occupies a single cell.

EFullWidth

Character which occupies 2 cells.

ENarrow

Characters that are always narrow and have explicit full-width counterparts. All of ASCII is an example of East Asian Narrow characters.

EWide

Characters that are always wide. This category includes characters that have explicit half-width counterparts.


Enum TEncoding

TEncoding

Description

Encoding systems used by the translation functions.

EUnicode

The Unicode encoding.

EShiftJIS

The shift-JIS encoding (used in Japan).


Enum anonymous

n/a

Description

Flags defining operations to be performed using TChar::Fold().

The flag values are passed to the Fold() funtion.

EFoldCase

Convert characters to their lower case form if any.

EFoldAccents

Strip accents

EFoldDigits

Convert digits representing values 0..9 to characters '0'..'9'

EFoldSpaces

Convert all spaces (ordinary, fixed-width, ideographic, etc.) to ' '

EFoldKana

Convert hiragana to katakana.

EFoldWidth

Fold fullwidth and halfwidth variants to their standard forms

EFoldStandard

Perform standard folding operations, i.e.those done by Fold() with no argument

EFoldAll

Perform all possible folding operations