|
||
class CCnvCharacterSetConverter : public CBase;
Converts text between Unicode and other character sets.
The first stage of the conversion is to specify the non-Unicode character set being converted to or from. This is done by
calling one of the overloads of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
.
The second stage is to convert the text, using one of the overloads of CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
or CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
Where possible the first documented overload of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
should be used because the second overload panics if the specified character set is not available: the first overload simply
returns whether the character set is available or not available. However if the conversions are to be performed often, or
if the user must select the character set for the conversion from a list, the second overload may be more appropriate.
The first overload is less efficient than the second, because it searches through the file system for the selected character
set every time it is invoked. The second overload searches through an array of all available character sets. In this method,
the file system need only be searched once - when CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableLC(RFs &)
or CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableL(RFs &)
is used to create the array.
The conversion functions allow users of this class to perform partial conversions on an input descriptor, handling the situation where the input descriptor is truncated mid way through a multi-byte character. This means that you do not have to guess how big to make the output descriptor for a given input descriptor, you can simply do the conversion in a loop using a small output descriptor. The ability to handle truncated descriptors also allows users of the class to convert information received in chunks from an external source.
The class also provides a number of utility functions.
CBase
-
Base class for all classes to be instantiated on the heap.
CCnvCharacterSetConverter
- Converts text between Unicode and other character sets.
Defined in CCnvCharacterSetConverter
:
AsciiConversionData()
Returns a ready-made SCnvConversionData object for converting between Unicode an...AutoDetectCharSetL(TInt &,TUint &,const CArrayFix< SCharacterSet > &,const TDesC8 &)
Attempts to determine the character set of the sample text from those supported ...AutoDetectCharacterSetL(TInt &,TUint &,const CArrayFix< SCharacterSet > &,const TDesC8 &)
DeprecatedConvertCharacterSetIdentifierToMibEnumL(TUint,RFs &)
Converts the UID of a character set to its MIB enum value.ConvertCharacterSetIdentifierToStandardNameL(TUint,RFs &)
Returns the Internet-standard name of a character set identified in Symbian OS b...ConvertFromUnicode(TDes8 &,const TDesC16 &)const
Converts text encoded in the Unicode character set (UCS-2) into other character ...ConvertFromUnicode(TDes8 &,const TDesC16 &,TArrayOfAscendingIndices &)const
Converts Unicode text into another character set.ConvertFromUnicode(TDes8 &,const TDesC16 &,TInt &)const
ConvertFromUnicode(TDes8 &,const TDesC16 &,TInt &,TInt &)const
Converts text encoded in the Unicode character set (UCS-2) into other character ...ConvertMibEnumOfCharacterSetToIdentifierL(TInt,RFs &)
Converts a MIB enum value to the UID value of the character set.ConvertStandardNameOfCharacterSetToIdentifierL(const TDesC8 &,RFs &)
Gets the UID of a character set identified by its Internet-standard name (the ma...ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
Converts text encoded in a non-Unicode character set into the Unicode character ...ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &,TInt &)const
ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &,TInt &,TInt &)const
Converts text encoded in a non-Unicode character set into the Unicode character ...ConvertibleToCharSetL(TInt &,const TUint,const CArrayFix< SCharacterSet > &,const TDesC8 &)
Given a character set UID aCharacterSetIdentifier, ConvertibleToCharacterSetL re...ConvertibleToCharacterSetL(TInt &,const TUint,const CArrayFix< SCharacterSet > &,const TDesC8 &)
DeprecatedCreateArrayOfCharacterSetsAvailableL(RFs &)
Creates an array identifying all the character sets for which conversion is avai...CreateArrayOfCharacterSetsAvailableLC(RFs &)
Creates an array identifying all the character sets for which conversion is avai...DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16 &,TArrayOfAscendingIndices
&)
Converts Unicode text into another character set. The Unicode text specified in ...DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16 &,TArrayOfAscendingIndices
&,TUint &,TUint)
Converts Unicode text into another character set. The Unicode text specified in ...DoConvertToUnicode(const SCnvConversionData &,TEndianness,TDes16 &,const TDesC8 &,TInt &,TInt &)
Converts non-Unicode text into Unicode. The non-Unicode text specified in aForei...DoConvertToUnicode(const SCnvConversionData &,TEndianness,TDes16 &,const TDesC8 &,TInt &,TInt &,TUint &,TUint)
Converts non-Unicode text into Unicode. The non-Unicode text specified in aForei...EAvailable
The requested character set can be converted. EBigEndian
The character set is little-endian. EDowngradeExoticLineTerminatingCharactersToCarriageReturnLineFeed
Paragraph/line separators should be downgraded (if necessary) into carriage retu...EDowngradeExoticLineTerminatingCharactersToJustLineFeed
Paragraph/line separators should be downgraded (if necessary) into a line feed o...EErrorIllFormedInput
The input descriptor contains a single corrupt character. This might occur when ...EInputConversionFlagAllowTruncatedInputNotEvenPartlyConsumable
By default, when the input descriptor passed to CCnvCharacterSetConverter::DoCon...EInputConversionFlagAppend
Appends the converted text to the output descriptor. EInputConversionFlagAssumeStartInDefaultCharacterSet
EInputConversionFlagMustEndInDefaultCharacterSet
Appends the default character set Escape sequence at end of converted text EInputConversionFlagStopAtFirstUnconvertibleCharacter
Stops converting when the first unconvertible character is reached. ELittleEndian
The character set is big-endian. ELowestThreshold
The lowest confidence value for a character set accepted by Autodetect ENotAvailable
The requested character set cannot be converted. EOutputConversionFlagInputIsTruncated
Indicates whether or not the source descriptor ends in a truncated sequence, e.g...GetDowngradeForExoticLineTerminatingCharacters()
KStateDefault
NewL()
Allocates and constructs a CCnvCharacterSetConverter object. If there is insuffi...NewLC()
Allocates and constructs a CCnvCharacterSetConverter object, and leaves the obje...PrepareToConvertToOrFromL(TUint,RFs &)
Specifies the character set to convert to or from. aCharacterSetIdentifier is a ...PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
Specifies the character set to convert to or from. aCharacterSetIdentifier is a ...SCharacterSet
Stores information about a non-Unicode character set. The information is used to...SetDefaultEndiannessOfForeignCharacters(TEndianness)
Sets the default endian-ness used by the CCnvCharacterSetConverter::ConvertFromU...SetDowngradeForExoticLineTerminatingCharacters(TDowngradeForExoticLineTerminatingCharacters)
Sets whether the Unicode 'line separator' and 'paragraph separator' characters (...SetMaxCacheSize(TInt)
The method sets the max size of the internal character set converter cache. The ...SetReplacementForUnconvertibleUnicodeCharactersL(const TDesC8 &)
Sets the character used to replace unconvertible characters in the output descri...TArrayOfAscendingIndices
Holds an ascending array of the indices of the characters in the source Unicode ...TAvailability
Indicates whether a character set is available or unavailable for conversion. ...TDowngradeForExoticLineTerminatingCharacters
Downgrade for line and paragraph separators TEndianness
Specifies the default endian-ness of the current character set. Used by CCnvChar...TError
Conversion error flags. At this stage there is only one error flag- others may b...anonymous
Output flag used to indicate whether or not a character in the source descriptor...anonymous
anonymous
Initial value for the state argument in a set of related calls to CCnvCharacterS...anonymous
~CCnvCharacterSetConverter()
The destructor frees all resources owned by the object, prior to its destruction...Inherited from CBase
:
Delete(CBase *)
Deletes the specified object.Extension_(TUint,TAny *&,TAny *)
Extension function operator new(TUint)
Allocates the object from the heap and then initialises its contents to binary z...operator new(TUint,TAny *)
Initialises the object to binary zeroes.operator new(TUint,TLeave)
Allocates the object from the heap and then initialises its contents to binary z...operator new(TUint,TLeave,TUint)
Allocates the object from the heap and then initialises its contents to binary z...operator new(TUint,TUint)
Allocates the object from the heap and then initialises its contents to binary z...IMPORT_C static CCnvCharacterSetConverter* NewL();
Allocates and constructs a CCnvCharacterSetConverter object. If there is insufficient memory to create the object, the function leaves.
Since the memory is allocated on the heap, objects of this type should be destroyed using the delete operator when the required conversions are complete.
|
IMPORT_C static CCnvCharacterSetConverter* NewLC();
Allocates and constructs a CCnvCharacterSetConverter object, and leaves the object on the cleanup stack. If there is insufficient memory to create the object, the function leaves.
Since the memory is allocated on the heap, objects of this type should be destroyed using either the CleanupStack::Pop()
function and then the delete operator, or the CleanupStack::PopAndDestroy()
function.
|
IMPORT_C virtual ~CCnvCharacterSetConverter();
The destructor frees all resources owned by the object, prior to its destruction.
IMPORT_C static CArrayFix< SCharacterSet >* CreateArrayOfCharacterSetsAvailableL(RFs &aFileServerSession);
Creates an array identifying all the character sets for which conversion is available. These can be character sets for which conversion is built into Symbian OS, or they may be character sets for which conversion is implemented by a plug-in DLL.
The array returned can be used by one of the CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
overloads to provide a list of all the character sets available for conversion. The caller of this function is responsible
for deleting the array, and should not modify it.
Not all encoders returned will be suitable for conversion from Unicode. Such encoders have no name and no MIB enum and so will generally not be understood by a receiving process. The function ConvertCharacterSetIdentifierToMibEnumL can be used to determine whether this is the case or not.
|
|
IMPORT_C static CArrayFix< SCharacterSet >* CreateArrayOfCharacterSetsAvailableLC(RFs &aFileServerSession);
Creates an array identifying all the character sets for which conversion is available and pushes a pointer to it onto the cleanup stack. These can be character sets for which conversion is built into Symbian OS, or they may be character sets for which conversion is implemented by a plug-in DLL.
The array returned can be used by one of the CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
overloads to provide a list of all the character sets available for conversion. The caller of this function is responsible
for deleting the array, and should not modify it.
Not all encoders returned will be suitable for conversion from Unicode. Such encoders have no name and no MIB enum and so will generally not be understood by a receiving process. The function ConvertCharacterSetIdentifierToMibEnumL can be used to determine whether this is the case or not.
This is a static function which uses ECOM functionality. It cleans up ECOM by calling FinalClose()
|
|
IMPORT_C TUint ConvertStandardNameOfCharacterSetToIdentifierL(const TDesC8 &aStandardNameOfCharacterSet, RFs &aFileServerSession);
Gets the UID of a character set identified by its Internet-standard name (the matching is case-insensitive).
If the character set specified is not one for which Symbian OS provides built-in conversion, the function searches the file system for plug-ins which implement the conversion and which provide the name-to-UID mapping information.
|
|
IMPORT_C HBufC8* ConvertCharacterSetIdentifierToStandardNameL(TUint aCharacterSetIdentifier, RFs &aFileServerSession);
Returns the Internet-standard name of a character set identified in Symbian OS by a UID.
If the character set specified is not one for which Symbian OS provides built-in conversion, the file system is searched for plug-ins which implement the conversion, hence the need for a file server session.
|
|
IMPORT_C TUint ConvertMibEnumOfCharacterSetToIdentifierL(TInt aMibEnumOfCharacterSet, RFs &aFileServerSession);
Converts a MIB enum value to the UID value of the character set.
If the character set identified is not one for which Symbian OS provides built-in conversion, the function searches the file system for plug-ins which implement the conversion and which provide the MIB enum-to-UID mapping information.
|
|
IMPORT_C TInt ConvertCharacterSetIdentifierToMibEnumL(TUint aCharacterSetIdentifier, RFs &aFileServerSession);
Converts the UID of a character set to its MIB enum value.
If the character set identified is not one for which Symbian OS provides built-in conversion, the function searches the file system for plug-ins which implement the conversion and which provide the UID-to-MIB enum mapping information.
|
|
IMPORT_C void PrepareToConvertToOrFromL(TUint aCharacterSetIdentifier, const CArrayFix< SCharacterSet > &aArrayOfCharacterSetsAvailable,
RFs &aFileServerSession);
Specifies the character set to convert to or from. aCharacterSetIdentifier is a UID which identifies a character set. It can be one of the character sets for which conversion is built into Symbian OS, or it may be a character set for which the conversion is implemented by a plug-in DLL.
The function searches the character set array specified (aArrayOfCharacterSetsAvailable). This is an array containing all
of the character sets for which conversion is available. It is created by calling CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableL(RFs &)
or CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableLC(RFs &)
. You should be sure that conversion is available for aCharacterSetIdentifier, because if not, a panic occurs. Otherwise,
use the other overload of this function.
Either this function or its overload, must be called before using the conversion functions CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
or CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
Unlike the other overload, this function does not search the file system for plug-in conversion DLLs, (unless aArrayOfCharacterSetsAvailable is NULL). This function should be used if conversions are to be performed often, or if the conversion character set is to be selected by the user. Generating the array of all the available character sets once and searching though it is more efficient than the method used by the other overload, in which the file system may be searched every time it is invoked.
Notes:
The file server session argument is used to open the required character set conversion data file.
The array passed to this function can also be used to provide a list from which a user can select the desired conversion character set.
|
IMPORT_C TAvailability PrepareToConvertToOrFromL(TUint aCharacterSetIdentifier, RFs &aFileServerSession);
Specifies the character set to convert to or from. aCharacterSetIdentifier is a UID which identifies a character set. It can be one of the character sets for which conversion is built into Symbian OS, or it may be a character set for which conversion is implemented by a plug-in DLL. In the latter case, the function searches through the file system for the DLL which implements the character conversion.
Either this function or its overload must be called before using the conversion functions CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
or CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
This overload of the function is simpler to use than the other and does not panic if the character set with the specified UID is not available at run timeÂ, it simply returns ENotAvailable. It should be used when the conversion character set is specified within the text object being converted, e.g. an email message, or an HTML document. If the character set is not specified, the user must be presented with a list of all available sets, so it makes sense to use the other overload.
The function may need to search the file system each time it is called. If conversion takes place repeatedly over a short period, it may be more efficient to use the other overload.
Notes:
Although the other overload of this function is more efficient, if the character set is one for which conversion is built into Symbian OS, the difference in speed is negligible.
|
|
IMPORT_C void SetDefaultEndiannessOfForeignCharacters(TEndianness aEndianness);
Sets the default endian-ness used by the CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
and CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
functions to convert between Unicode and non-Unicode character sets.
The endian-ness of a multi-byte character set may be defined in the character set definition or, as in the case of UCS-2, be operating system dependent. If the endian-ness of the current character set is defined by the character set itself, then the default endian-ness specified by this function is ignored.
Notes:
The issue of endian-ness does not apply to single byte character sets as there is no byte order.
This function should be called (if at all) after calling CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
and before calling CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
and/or CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
|
IMPORT_C void SetDowngradeForExoticLineTerminatingCharacters(TDowngradeForExoticLineTerminatingCharacters aDowngradeForExoticLineTerminatingCharacters);
Sets whether the Unicode 'line separator' and 'paragraph separator' characters (0x2028 and 0x2029 respectively) should be converted into a carriage return / line feed pair, or into a line feed only when converting from Unicode into a foreign character set. This applies to all foreign character sets that do not contain a direct equivalent of these Unicode character codes.
By default, line and paragraph separators are converted into a CR/LF pair. This function should be called (if at all) after
calling CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
and before calling CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
and/or CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
|
IMPORT_C void SetReplacementForUnconvertibleUnicodeCharactersL(const TDesC8 &aReplacementForUnconvertibleUnicodeCharacters);
Sets the character used to replace unconvertible characters in the output descriptor, when converting from Unicode into another character set.
The default replacement for unconvertible Unicode characters is specified in the conversion data for the character set. The replacement text which is set using this function overrides the default value.
Notes:
If the replacement character is multi-byte, and its endian-ness is undefined in the character set, then its byte order is taken by default to be little-endian.
CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
undoes the effect of any previous calls to this function. So, to have any effect, this function should be called between
the CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
call and the subsequent CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
call or calls.
The value only applies when converting from Unicode to another character set. In Unicode, the code for 'unknown character'is always 0xFFFD.
|
IMPORT_C TInt ConvertFromUnicode(TDes8 &aForeign, const TDesC16 &aUnicode) const;
Converts text encoded in the Unicode character set (UCS-2) into other character sets.
The first overload of the function simply performs the conversion. The second overload converts the text and gets the number
of characters that could not be converted. The third overload converts the text, gets the number of characters that could
not be converted, and also gets the index of the first character that could not be converted. A fourth overload was introduced
in v6.0 see below.All overloads cause a panic if no target character set has been selected to convert to (i.e. either overload
of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
must have been successfully called beforehand). You may also need to call CCnvCharacterSetConverter::SetDefaultEndiannessOfForeignCharacters(TEndianness)
to define the endian-ness of the output descriptor.Notes:A sixteen-bit descriptor is used to hold the source Unicode encoded
text, and an eight-bit descriptor is used to hold the converted non-Unicode text. Eight-bit descriptors are used because non-Unicode
character sets may use a single byte per character (e.g. Code Page 1252) or more than one byte per character (e.g. GB 2312-80)
or even a variable number of bytes per character (e.g. Shift-JIS).The function will fail to convert all the input descriptor
if the output descriptor is not long enough to hold all the text.Unicode characters cannot be converted if there is no equivalent
for them in the target character set. This does not stop the conversion, the missing character is simply replaced by the character
in the target character set which represents unknown characters. This default unknown character can be changed using CCnvCharacterSetConverter::SetReplacementForUnconvertibleUnicodeCharactersL(const TDesC8 &)
.
|
|
IMPORT_C TInt ConvertFromUnicode(TDes8 &aForeign, const TDesC16 &aUnicode, TInt &aNumberOfUnconvertibleCharacters) const;
|
|
IMPORT_C TInt ConvertFromUnicode(TDes8 &aForeign, const TDesC16 &aUnicode, TInt &aNumberOfUnconvertibleCharacters, TInt &aIndexOfFirstUnconvertibleCharacter)
const;
Converts text encoded in the Unicode character set (UCS-2) into other character sets.
The first overload of the function simply performs the conversion. The second overload converts the text and gets the number of characters that could not be converted. The third overload converts the text, gets the number of characters that could not be converted, and also gets the index of the first character that could not be converted. A fourth overload was introduced in v6, see below.
All overloads cause a panic if no target character set has been selected to convert to (i.e. either overload of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
must have been successfully called beforehand). You may also need to call CCnvCharacterSetConverter::SetDefaultEndiannessOfForeignCharacters(TEndianness)
to define the endian-ness of the output descriptor.
Notes:
A sixteen-bit descriptor is used to hold the source Unicode encoded text, and an eight-bit descriptor is used to hold the converted non-Unicode text. Eight-bit descriptors are used because non-Unicode character sets may use a single byte per character (e.g. Code Page 1252) or more than one byte per character (e.g. GB 2312-80) or even a variable number of bytes per character (e.g. Shift-JIS).
The function will fail to convert all the input descriptor if the output descriptor is not long enough to hold all the text.
Unicode characters cannot be converted if there is no equivalent for them in the target character set. This does not stop
the conversion, the missing character is simply replaced by the character in the target character set which represents unknown
characters. This default unknown character can be changed using CCnvCharacterSetConverter::SetReplacementForUnconvertibleUnicodeCharactersL(const TDesC8 &)
.
|
|
IMPORT_C TInt ConvertFromUnicode(TDes8 &aForeign, const TDesC16 &aUnicode, TArrayOfAscendingIndices &aIndicesOfUnconvertibleCharacters)
const;
Converts Unicode text into another character set.
Differs from the other overloads of this function by returning the indices of all of the characters in the source Unicode text which could not be converted.
|
|
IMPORT_C TInt ConvertToUnicode(TDes16 &aUnicode, const TDesC8 &aForeign, TInt &aState) const;
Converts text encoded in a non-Unicode character set into the Unicode character set (UCS-2).
The first overload of the function simply performs the conversion. The second overload converts the text and gets the number
of bytes in the input string that could not be converted. The third overload converts the text, gets the number of bytes that
could not be converted, and also gets the index of the first byte that could not be converted.All overloads cause a panic
if no source character set has been selected to convert from (i.e. either overload of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
must have been successfully called beforehand). You may also need to call CCnvCharacterSetConverter::SetDefaultEndiannessOfForeignCharacters(TEndianness)
to define the endian-ness of the input descriptor.Notes: Since Unicode is intended to be the superset of all character sets,
the function should usually report zero unconverted characters. Unconvertible characters will exist if the input descriptor
contains illegal characters, i.e. values not in the selected non-Unicode character set.The presence of illegal characters
does not stop the conversion. The missing character is simply replaced by the Unicode character which represents unknown characters
(0xFFFD).If the source text consists solely of a character that is not complete, the function returns EErrorIllFormedInput.
The reason for this is to prevent the possibility of the calling code getting into a infinite loop.
|
|
IMPORT_C TInt ConvertToUnicode(TDes16 &aUnicode, const TDesC8 &aForeign, TInt &aState, TInt &aNumberOfUnconvertibleCharacters)
const;
|
|
IMPORT_C TInt ConvertToUnicode(TDes16 &aUnicode, const TDesC8 &aForeign, TInt &aState, TInt &aNumberOfUnconvertibleCharacters,
TInt &aIndexOfFirstByteOfFirstUnconvertibleCharacter) const;
Converts text encoded in a non-Unicode character set into the Unicode character set (UCS-2).
The first overload of the function simply performs the conversion. The second overload converts the text and gets the number of bytes in the input string that could not be converted. The third overload converts the text, gets the number of bytes that could not be converted, and also gets the index of the first byte that could not be converted.
All overloads cause a panic if no source character set has been selected to convert from (i.e. either overload of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
must have been successfully called beforehand). You may also need to call CCnvCharacterSetConverter::SetDefaultEndiannessOfForeignCharacters(TEndianness)
to define the endian-ness of the input descriptor.
Notes:
Since Unicode is intended to be the superset of all character sets, the function should usually report zero unconverted characters. Unconvertible characters will exist if the input descriptor contains illegal characters, i.e. values not in the selected non-Unicode character set.
The presence of illegal characters does not stop the conversion. The missing character is simply replaced by the Unicode character which represents unknown characters (0xFFFD).
If the source text consists solely of a character that is not complete, the function returns EErrorIllFormedInput. The reason for this is to prevent the possibility of the calling code getting into a infinite loop.
|
|
IMPORT_C static void AutoDetectCharacterSetL(TInt &aConfidenceLevel, TUint &aCharacterSetIdentifier, const CArrayFix< SCharacterSet
> &aArrayOfCharacterSetsAvailable, const TDesC8 &aSample);
Deprecated
|
CCnvCharacterSetConverter::AutoDetectCharSetL(TInt &,TUint &,const CArrayFix< SCharacterSet > &,const TDesC8 &)
Attempts to determine the character set of the sample text from those supported ...IMPORT_C void AutoDetectCharSetL(TInt &aConfidenceLevel, TUint &aCharacterSetIdentifier, const CArrayFix< SCharacterSet >
&aArrayOfCharacterSetsAvailable, const TDesC8 &aSample);
Attempts to determine the character set of the sample text from those supported on the phone.
For each of the available character sets, its implementation of IsInThisCharacterSetL() is called. The character set which returns the highest confidence level (i.e. which generates the fewest 0xFFFD Unicode replacement characters) is returned in aCharacterSetIdentifier.
This function merely determines if the sample text is convertible with this converter: it does no textual analysis on the result. Therefore, this function is not capable of differentiating between very similar encodings (for example the different ISO 8859 variants).
Any code making use of this function should provide a way for the user to override the selection that this function makes.
Please note that the operation of this function is slow.It takes no account of the usual ontext that would be used in guessing a character set (for example, the language that is expected to be encoded or the transport used). For situations where such context is known, a faster, more accurate solution is advisable.
To improve a performance of autodetection, a size (default is one) of interface proxy cache should be increased (see SetCharacterSetCacheSize()).However a boost of performance will not be visible within a first funtion call because during this first call character sets are loaded to a cache. Once created it will be preserved until CCnvCharacterSetConverter object is destroyed.
This is a static function which uses ECOM functionality. It cleans up ECOM by calling FinalClose()
|
IMPORT_C static void ConvertibleToCharacterSetL(TInt &aConfidenceLevel, const TUint aCharacterSetIdentifier, const CArrayFix<
SCharacterSet > &aArrayOfCharacterSetsAvailable, const TDesC8 &aSample);
Deprecated
|
CCnvCharacterSetConverter::ConvertibleToCharSetL(TInt &,const TUint,const CArrayFix< SCharacterSet > &,const TDesC8 &)
Given a character set UID aCharacterSetIdentifier, ConvertibleToCharacterSetL re...IMPORT_C void ConvertibleToCharSetL(TInt &aConfidenceLevel, const TUint aCharacterSetIdentifier, const CArrayFix< SCharacterSet
> &aArrayOfCharacterSetsAvailable, const TDesC8 &aSample);
Given a character set UID aCharacterSetIdentifier, ConvertibleToCharacterSetL returns the likelihood that aSample is encoded in that character set. It goes through the array of character sets aArrayOfCharacterSetsAvailable and searches for the character set matching aCharacterSetIdentifier. The character sets IsInThisCharacterSetL function is called to determine the probability of it being encoded in that character set.
This is a static function which uses ECOM functionality. It cleans up ECOM by calling FinalClose()
|
IMPORT_C void SetMaxCacheSize(TInt aSize);
The method sets the max size of the internal character set converter cache. The cache is used mainly to improve the performance
of CCnvCharacterSetConverter::AutoDetectCharSetL(TInt &,TUint &,const CArrayFix< SCharacterSet > &,const TDesC8 &)
calls. It caches loaded converter implementations. The next time when a specific implementation is needed, a search will
be done in the cache if this implementation is already loaded and if it is there, the cached implementation will be used.
CCnvCharacterSetConverter::SetMaxCacheSize(TInt)
call is used to limit the max cache size, because the loaded implementatiions may consume a lot of the system resources (memory
for example). By default (if CCnvCharacterSetConverter::SetMaxCacheSize(TInt)
is never called) the max cache size is limited to 32 entries. Note: Setting very small cache size will impact the overall
performance of CHARCONV functions. If the choosen cache size is less than the number of existing character set converter implementations,
there will be no performance gain or it will be far beyond client's expectations. For best performance the choosen cache size
should be bigger or equal to the number of the existing character set converter implementations.
|
|
CCnvCharacterSetConverter::AutoDetectCharSetL(TInt &,TUint &,const CArrayFix< SCharacterSet > &,const TDesC8 &)
Attempts to determine the character set of the sample text from those supported ...DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16 &,TArrayOfAscendingIndices
&)
IMPORT_C static TInt DoConvertFromUnicode(const SCnvConversionData &aConversionData, TEndianness aDefaultEndiannessOfForeignCharacters,
const TDesC8 &aReplacementForUnconvertibleUnicodeCharacters, TDes8 &aForeign, const TDesC16 &aUnicode, TArrayOfAscendingIndices
&aIndicesOfUnconvertibleCharacters);
Converts Unicode text into another character set. The Unicode text specified in aUnicode is converted using the conversion data object (aConversionData) provided by the plug-in for the foreign character set, and the converted text is returned in aForeign.
Note
This is a utility function that should only be called from a plug-in conversion library's implementation of CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
. Users of the Character Conversion API should use one of the overloads of CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
instead.
|
|
DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16 &,TArrayOfAscendingIndices
&,TUint &,TUint)
IMPORT_C static TInt DoConvertFromUnicode(const SCnvConversionData &aConversionData, TEndianness aDefaultEndiannessOfForeignCharacters,
const TDesC8 &aReplacementForUnconvertibleUnicodeCharacters, TDes8 &aForeign, const TDesC16 &aUnicode, TArrayOfAscendingIndices
&aIndicesOfUnconvertibleCharacters, TUint &aOutputConversionFlags, TUint aInputConversionFlags);
Converts Unicode text into another character set. The Unicode text specified in aUnicode is converted using the conversion data object (aConversionData) provided by the plug-in for the foreign character set, and the converted text is returned in aForeign.
This overload differs from the previous one in that it allows the caller to specify flags which give more control over the conversion.
Note
This is a utility function that should only be called from a plug-in conversion library's implementation of CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
. Users of the Character Conversion API should use one of the overloads of CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
instead.
|
|
IMPORT_C static TInt DoConvertToUnicode(const SCnvConversionData &aConversionData, TEndianness aDefaultEndiannessOfForeignCharacters,
TDes16 &aUnicode, const TDesC8 &aForeign, TInt &aNumberOfUnconvertibleCharacters, TInt &aIndexOfFirstByteOfFirstUnconvertibleCharacter);
Converts non-Unicode text into Unicode. The non-Unicode text specified in aForeign is converted using the conversion data object (aConversionData) provided by the plug-in for the foreign character set, and the converted text is returned in aUnicode.
Notes:
This is a utility function that should only be called from a plug-in conversion library's implementation of CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
. Ordinary users of the Character Conversion API should use one of the overloads of CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
instead.
The last two arguments return information about unconverted characters. Because Unicode is intended to cover all possible characters, these rarely report anything other than zero characters. However they report the existence of unconvertible characters if the input descriptor aForeign contains illegal characters, i.e. values not in the foreign character set.
|
|
DoConvertToUnicode(const SCnvConversionData &,TEndianness,TDes16 &,const TDesC8 &,TInt &,TInt &,TUint &,TUint)
IMPORT_C static TInt DoConvertToUnicode(const SCnvConversionData &aConversionData, TEndianness aDefaultEndiannessOfForeignCharacters,
TDes16 &aUnicode, const TDesC8 &aForeign, TInt &aNumberOfUnconvertibleCharacters, TInt &aIndexOfFirstByteOfFirstUnconvertibleCharacter,
TUint &aOutputConversionFlags, TUint aInputConversionFlags);
Converts non-Unicode text into Unicode. The non-Unicode text specified in aForeign is converted using the conversion data object (aConversionData) provided by the plug-in for the foreign character set, and the converted text is returned in aUnicode.
This overload differs from the previous one in that it allows the caller to specify flags which give more control over the conversion.
Notes:
This is a utility function that should only be called from a plug-in conversion library's implementation of CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
. Ordinary users of the Character Conversion API should use one of the overloads of CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
instead.
The aNumberOfUnconvertibleCharacters and aIndexOfFirstByteOfFirstUnconvertibleCharacter arguments return information about unconverted characters. Because Unicode is intended to cover all possible characters, these rarely report anything other than zero characters. However they report the existence of unconvertible characters if the input descriptor aForeign contains illegal characters, i.e. values not in the foreign character set.
|
|
IMPORT_C static const SCnvConversionData& AsciiConversionData();
Returns a ready-made SCnvConversionData
object for converting between Unicode and ASCII. This can be passed into the aConversionData parameter to CCnvCharacterSetConverter::DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16
&,TArrayOfAscendingIndices &)
or CCnvCharacterSetConverter::DoConvertToUnicode(const SCnvConversionData &,TEndianness,TDes16 &,const TDesC8 &,TInt &,TInt &)
.
Note: This utility function should only be called by a plug-in conversion library.
|
inline TDowngradeForExoticLineTerminatingCharacters GetDowngradeForExoticLineTerminatingCharacters();
|
class TArrayOfAscendingIndices;
Holds an ascending array of the indices of the characters in the source Unicode text which could not be converted by CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
into the foreign character set
Defined in CCnvCharacterSetConverter::TArrayOfAscendingIndices
:
AppendIndex(TInt)
Appends an index to the array of indices.EAppendFailed
The append failed. EAppendSuccessful
The append succeeded. NumberOfIndices()const
Returns the number of indices in the array.Remove(TInt)
Deletes a single index from the array.RemoveAll()
Deletes all indices from the array. TAppendResult
The return value of CCnvCharacterSetConverter::AppendIndex(). TArrayOfAscendingIndices()
C++ constructor. The array is initialised to be of length zero. operator[](TInt)const
Gets the value of the specified index.TArrayOfAscendingIndices()
inline TArrayOfAscendingIndices();
C++ constructor. The array is initialised to be of length zero.
AppendIndex(TInt)
IMPORT_C TAppendResult AppendIndex(TInt aIndex);
Appends an index to the array of indices.
The value of aIndex should be greater than that of the last index in the array, to maintain an ascending array. The return value should be tested to see whether the function succeeded or not.
|
|
Remove(TInt)
inline void Remove(TInt aIndexOfIndex);
Deletes a single index from the array.
|
RemoveAll()
inline void RemoveAll();
Deletes all indices from the array.
NumberOfIndices()const
inline TInt NumberOfIndices() const;
Returns the number of indices in the array.
|
operator[](TInt)const
inline TInt operator[](TInt aIndexOfIndex) const;
Gets the value of the specified index.
|
|
TAppendResult
TAppendResult
The return value of CCnvCharacterSetConverter::AppendIndex().
|
struct SCharacterSet;
Stores information about a non-Unicode character set. The information is used to locate the conversion information required
by CCnvCharacterSetConverter::ConvertFromUnicode(TDes8 &,const TDesC16 &)const
and CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
An array of these structs that contain all available character sets can be generated by CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableLC(RFs &)
and CCnvCharacterSetConverter::CreateArrayOfCharacterSetsAvailableL(RFs &)
, and is used by one of the overloads of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
.
Defined in CCnvCharacterSetConverter::SCharacterSet
:
Identifier()const
Gets the character sets UID.Name()const
Gets the full path and filename of the DLL which implements conversion for the c...NameIsFileName()const
Tests whether a filename given by the function CCnvCharacterSetConverter::SChara...Identifier()const
inline TUint Identifier() const;
Gets the character sets UID.
|
NameIsFileName()const
inline TBool NameIsFileName() const;
Tests whether a filename given by the function CCnvCharacterSetConverter::SCharacterSet::Name()const
is a real file name (i.e. conversion is provided by a plug in DLL), or just the character set name (i.e. conversion is built
into Symbian OS).
Note: If the function returns ETrue then the path and filename can be parsed using TParse
or TParsePtrC
functions to obtain just the filename.
|
Name()const
inline TPtrC Name() const;
Gets the full path and filename of the DLL which implements conversion for the character set.
If the character set is one for which conversion is built into Symbian OS rather than implemented by a plug in DLL, the function
just returns the name of the character set. The CCnvCharacterSetConverter::SCharacterSet::NameIsFileName()const
function can be used to determine whether or not it is legal to create a TParsePtrC
object over the descriptor returned by CCnvCharacterSetConverter::SCharacterSet::Name()const
.
Notes:
The name returned cannot be treated as an Internet-standard name, it is locale-independent and should be mapped to the locale-dependent
name by software at a higher level before being shown to the user. Conversion from Internet-standard names of character sets
to the UID identifiers is provided by the member function CCnvCharacterSetConverter::ConvertStandardNameOfCharacterSetToIdentifierL(const TDesC8 &,RFs &)
.
Typically, to find the user-displayable name (as opposed to the internet-standard name) of a character set, you would do something like this:
const CCnvCharacterSetConverter::SCharacterSet& characterSet=...;
const TPtrC userDisplayable(characterSet.NameIsFileName()? TParsePtrC(characterSet.Name()).Name():
characterSet.Name());
|
TAvailability
Indicates whether a character set is available or unavailable for conversion. Used by the second overload of CCnvCharacterSetConverter::PrepareToConvertToOrFromL(TUint,const CArrayFix< SCharacterSet > &,RFs &)
.
|
TError
Conversion error flags. At this stage there is only one error flag- others may be added in the future.
|
TEndianness
Specifies the default endian-ness of the current character set. Used by CCnvCharacterSetConverter::SetDefaultEndiannessOfForeignCharacters(TEndianness)
.
|
TDowngradeForExoticLineTerminatingCharacters
Downgrade for line and paragraph separators
|
n/a
Output flag used to indicate whether or not a character in the source descriptor is the first half of a surrogate pair, but is the last character in the descriptor to convert.
Note: This enumeration can be used in the CCnvCharacterSetConverter::DoConvertToUnicode(const SCnvConversionData &,TEndianness,TDes16 &,const TDesC8 &,TInt &,TInt &)
and CCnvCharacterSetConverter::DoConvertFromUnicode(const SCnvConversionData &,TEndianness,const TDesC8 &,TDes8 &,const TDesC16
&,TArrayOfAscendingIndices &)
functions. These are part of the Character Conversion Plug-in Provider API and are for use by plug-in conversion libraries
only.
|
n/a
|
n/a
Initial value for the state argument in a set of related calls to CCnvCharacterSetConverter::ConvertToUnicode(TDes16 &,const TDesC8 &,TInt &)const
.
|
n/a
|