Character conversion to and from Unicode

The class CCnvCharacterSetConverter is used for converting text between Unicode (UCS-2) and other character sets. Sixteen-bit descriptors are used to hold text encoded in UCS-2, and eight-bit descriptors are used to hold text encoded in all other character sets — whether these are encoded as a single byte per character, multiple bytes, or a variable number of bytes.

The first stage of the conversion is to define the non-Unicode character set to be converted to or from, and the second is to convert the text. Selecting the character set to be converted to or from is done using one of the overloads of PrepareToConvertToOrFromL(). Text conversion itself is done using one of the overloads of ConvertFromUnicode() or ConvertToUnicode().

The conversion functions allow piece-by-piece conversion of input strings. Callers do not have to try to guess how big to make the output descriptor for a given input descriptor; they can simply do the conversion in a loop using a small output descriptor. The ability to cope with an input descriptor being truncated is useful if the caller cannot guarantee whether the input descriptor will be complete, e.g. if they themselves are receiving it in chunks from an external source.

The conversion does not fail if characters are unknown, which may happen if the input character is not valid in its own character set — unlikely in the case of Unicode — or because there is no equivalent in the selected output character set. Instead, unknown characters are replaced with a default character, or a character which may be specified using one of the utility functions.

There is no direct way to convert between two non-Unicode character sets. Though it is possible to convert from one character set to Unicode and then from Unicode to another character set.