CCnvCharacterSetConverter
These examples illustrate the way in which
CCnvCharacterSetConverter
might be used to convert chunks of text
from Unicode to another character set, from another character set to Unicode,
and from a character set to Unicode when the text is arriving in fragments.
The first three examples use the simpler variant of
PrepareToConvertToOrFromL()
. The final example demonstrates, in
code fragment form, the use of the other variant.
This example shows how to convert Unicode text, in chunks, to another character set. The conversion process has two parts. In the first part the code checks to see that the selected character set is available and opens the conversion data file, and in the second the text is actually converted. The text is converted in chunks so that it is not necessary to guess the full size of the converted output text in advance.
The function below first uses
PrepareToConvertToOrFromL()
to check whether the nominated
character set is supported, and leaves if it is not. This variant of
PrepareToConvertToOrFromL()
is preferred here because we are told
which character set to convert to, and because the other variant would panic
the thread if the character set were not available. The code also creates an
output descriptor, and a ‘remainder’ buffer for holding the unconverted
Unicode characters. This remainder buffer is initialised with the text in the
input descriptor.
LOCAL_C void ConvertUnicodeTextL(CCnvCharacterSetConverter& aCharacterSetConverter,
RFs& aFileServerSession, TUint aForeignCharacterSet, const TDesC16& aUnicodeText)
{
// check to see if the character set is supported - if not then leave
if (aCharacterSetConverter.PrepareToConvertToOrFromL(aForeignCharacterSet,
aFileServerSession) != CCnvCharacterSetConverter::EAvailable)
User::Leave(KErrNotSupported);
// Create a small-ish output buffer. 20 bytes recommended minimum
TBuf8<20> outputBuffer;
// Create a buffer for the unconverted text - initialised with input descriptor
TPtrC16 remainderOfUnicodeText(aUnicodeText);
After it has been confirmed that the character set is available, a loop is set up to convert the text in the remainder buffer— which initially contains all the information in the input descriptor.
ConvertFromUnicode()
converts characters from the
remainder buffer until the small output buffer is full — the Unicode contents
of the output buffer are then safely stored. Then the remainder buffer is reset
so that it only contains unconverted text.
This process is repeated until the remainder buffer is empty, and the function completes. The code fragment below also includes code to check for corrupted characters.
for(;;) // conversion loop
{
// Start conversion. When the output buffer is full, return the
// number of characters that were not converted
const TInt returnValue=aCharacterSetConverter.ConvertFromUnicode(outputBuffer,
remainderOfUnicodeText);
// check to see that the descriptor isn’t corrupt - leave if it is
if (returnValue==CCnvCharacterSetConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
// Finish conversion if there are no unconverted characters in the remainder buffer
if (returnValue==0)
break;
// Remove the converted source text from the remainder buffer.
// The remainder buffer is then fed back into loop
remainderOfUnicodeText.Set(remainderOfUnicodeText.Right(returnValue));
}
}
This example shows how to convert text in a non-Unicode character set, in chunks, into Unicode text. The process is similar to that given in the previous example — to convert from Unicode to a non-Unicode character set. In the first part the code checks to see that the selected character set is available and opens the conversion data file, and in the second the text is actually converted. The text is converted in chunks so that it is not necessary to guess the full size of the converted output text in advance.
The function below first uses PrepareToConvertToOrFromL()
to check whether the nominated character set is supported, and leaves if it is
not. As in the previous example, this variant of
PrepareToConvertToOrFromL()
is preferred because we are told which
character set to convert from, and because the other variant would panic the
thread if the character set were not available. The function also creates an
output descriptor, a state variable, and a ‘remainder’ buffer for holding
the unconverted characters — this is initialised with the text in the input
descriptor. The state variable is initialised with
CCnvCharacterSetConverter::KStateDefault
— after initialisation
this should not be tampered with, but simply be passed into subsequent calls to
ConvertToUnicode()
.
LOCAL_C void ConvertForeignTextL(CCnvCharacterSetConverter& aCharacterSetConverter,
RFs& aFileServerSession, TUint aForeignCharacterSet, const TDesC8& aForeignText)
{
// check to see if the character set is supported - if not then leave
if (aCharacterSetConverter.PrepareToConvertToOrFromL(aForeignCharacterSet,
aFileServerSession) != CCnvCharacterSetConverter::EAvailable)
User::Leave(KErrNotSupported);
// Create a small output buffer
TBuf16<20> outputBuffer;
// Create a buffer for the unconverted text - initialised with the input descriptor
TPtrC8 remainderOfForeignText(aForeignText);
// Create a "state" variable and initialise it with CCnvCharacterSetConverter::KStateDefault
// After initialisation the state variable must not be tampered with.
// Simply pass into each subsequent call of ConvertToUnicode()
TInt state=CCnvCharacterSetConverter::KStateDefault;
After it has been confirmed that the character set is available, a loop is set up to convert the text in the remainder buffer — which initially contains all the information in the input descriptor.
ConvertFromUnicode()
converts characters from the
remainder buffer until the output buffer is full — the Unicode contents of
the output buffer are then safely stored. Then the remainder buffer is reset so
that it only contains unconverted text. This process is repeated until the
remainder buffer is empty, and the function completes. The code fragment below
also includes code to check for corrupted characters.
for(;;) // conversion loop
{
// Start conversion. When the output buffer is full, return the number
// of characters that were not converted
const TInt returnValue=aCharacterSetConverter.ConvertToUnicode(outputBuffer, remainderOfForeignText, state);
// check to see that the descriptor isn’t corrupt - leave if it is
if (returnValue==CCnvCharacterSetConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
// Finish conversion if there are no unconverted
// characters in the remainder buffer
if (returnValue==0)
break;
// Remove converted source text from the remainder buffer.
// The remainder buffer is then fed back into loop
remainderOfForeignText.Set(remainderOfForeignText.Right(returnValue));
}
}
The example below demonstrates how to convert fragmented text from a non-Unicode character set into Unicode. The main difficulty in converting fragmented text is that received chunks may begin or end with bytes from an incomplete character.
To overcome this problem, implementations must ensure that the
descriptors passed to ConvertToUnicode()
always begin with a
complete character (making the output descriptor at least 20 elements long
should be enough to ensure this), and that conversions only progress to
completion for the final chunk of text — in which the last character is
complete. In the function below this is achieved by beginning the buffer for
each chunk with a small amount of unconverted text from the previous chunk. The
buffer is then guaranteed to begin with a complete character. Any ‘loose’
bytes from the end of the last chunk and the beginning of the new one combine
to form a complete character.
The function first uses PrepareToConvertToOrFromL()
to check
whether the nominated character set is supported, and leaves if it is not. As
in the previous examples, this variant
of PrepareToConvertToOrFromL()
is preferred because we are told
which character set to convert from, and because the other variant would panic
the thread if the character set were not available. The function also creates a
buffer to hold both the unconverted text fragment from the previous chunk, and
the new chunk.
LOCAL_C void ConvertForeignTextL(CCnvCharacterSetConverter& aCharacterSetConverter,
RFs& aFileServerSession, TUint aForeignCharacterSet)
{
// check to see if the character set is supported - if not then leave
if (aCharacterSetConverter.PrepareToConvertToOrFromL(aForeignCharacterSet,
aFileServerSession) != CCnvCharacterSetConverter::EAvailable)
User::Leave(KErrNotSupported);
// Create a buffer for holding non-Unicode text to be converted
const TInt KMaximumLengthOfBufferForForeignText=200;
TUint8 bufferForForeignText[KMaximumLengthOfBufferForForeignText];
// Create a variable for indicating the actual amount of text in the buffer
TInt lengthOfBufferForForeignText=0;
A loop is then set up to get a new chunk and append it to the unconverted
text fragment from the previous chunk. The code also contains a placeholder to
find out whether the current chunk is the last chunk, and stores this
information in a flag. In addition, it creates an output descriptor, a state
variable, and a ‘remainder’ buffer for holding the unconverted text. The
state variable is initialised
with CCnvCharacterSetConverter::KStateDefault
— after
initialisation this should not be tampered with, but simply be passed into
subsequent calls to ConvertToUnicode()
.
// Outer loop.
// Appends new chunk to fragment from previous chunk
// Then passes the buffer to the conversion loop.
for (;;)
{
// Create a modifiable pointer descriptor for the next chunk of non-Unicode text
TPtr8 nextChunkOfForeignText(bufferForForeignText+lengthOfBufferForForeignText,
KMaximumLengthOfBufferForForeignText-lengthOfBufferForForeignText);
// Insert code to load next chunk of non-Unicode text here
// Calculate the length of the next chunk of text to be processed
const TInt lengthOfNextChunkOfForeignText=nextChunkOfForeignText.Length();
// Specify the length of the buffer for non-Unicode text
lengthOfBufferForForeignText+=lengthOfNextChunkOfForeignText;
// Set whether this is the last chunk - find out from source of text
const TBool isLastChunkOfForeignText= // ?
// e.g. the source may define that the last chunk is of length zero, in which case
// the expression "(lengthOfNextChunkOfForeignText==0)" would be assigned to
// this variable; note that even if the length of this chunk is zero,
// we can't just exit this function here as bufferForForeignText
// may not be empty (i.e. lengthOfBufferForForeignText>0)
// Create a small output buffer
TBuf16<20> outputBuffer;
// Create a remainder buffer for the unconverted text - used in conversion loop
TPtrC8 remainderOfForeignText(bufferForForeignText, lengthOfBufferForForeignText);
// Create a "state" variable and initialise it with CCnvCharacterSetConverter::KStateDefault
// After initialisation the state variable must not be tampered with.
// Simply pass into each subsequent call of ConvertToUnicode()
TInt state=CCnvCharacterSetConverter::KStateDefault;
The remainder buffer is passed to the conversion loop.
ConvertFromUnicode()
converts characters from the remainder
buffer until the output buffer is full — the Unicode contents of the output
buffer are then safely stored. Then the remainder buffer is reset so that it
only contains unconverted text. This process is repeated until the remainder
buffer contains just less than 20 bytes — 20 is selected to ensure that the
function never tries to convert a single partial multi-byte character. The
remainder of the unconverted bytes are copied into the main foreign text
buffer, and the function returns to the outer loop. The process then repeats
itself, with a new chunk being added to the foreign text buffer etc.
If the ‘last chunk’ flag is set — in the main loop — then the conversion continues until the remainder buffer is empty. The function then completes. The code fragment below also includes code to check for corrupted characters.
// The conversion loop. This loop takes chunks of text prepared by the previous loop and converts them
for(;;) // conversion loop
{
const TInt lengthOfRemainderOfForeignText=remainderOfForeignText.Length();
if (isLastChunkOfForeignText)
{
if (lengthOfRemainderOfForeignText==0)
return; // the single point of exit of this function
}
else
{
// As this isn't the last chunk, ConvertToUnicode should not return
// CCnvCharacterSetConverter::EErrorIllFormedInput if the input descriptor ends
// with an incomplete sequence - but it will only do this if *none* of the input
// descriptor can be consumed. Therefore if the input descriptor is long enough
// (20 elements or longer is plenty adequate) there is no danger of this error
// being returned for this reason. If it's shorter than that, simply put it
// at the start of the buffer so that it gets converted with the next chunk.
if (lengthOfRemainderOfForeignText<20)
{
// put any remaining foreign text at the start of bufferForForeignText
lengthOfBufferForForeignText=lengthOfRemainderOfForeignText;
Mem::Copy(bufferForForeignText, remainderOfForeignText.Ptr(), lengthOfBufferForForeignText);
break;
}
}
const TInt returnValue=aCharacterSetConverter.ConvertToUnicode(outputBuffer,
remainderOfForeignText, state);
if (returnValue==CCnvCharacterSetConverter::EErrorIllFormedInput)
User::Leave(KErrCorrupt);
else if (returnValue<0) // future-proof against "TError" expanding
User::Leave(KErrGeneral);
// Do something here to store the contents of the output buffer.
remainderOfForeignText.Set(remainderOfForeignText.Right(returnValue));
}
}
The faster variant of PrepareToConvertL()
is suitable if the
required character set is to be selected by the user from the list of available
character sets, or if frequent conversions to/from different character sets are
needed. In most cases the other variant is preferred. The code fragments below
briefly outline the usage of the faster variant.
As with the other variant, a file server session must be passed in —
this is used when searching the file system for available character sets. The
CCnvCharacterSetConverter
object is created, and used to invoke
the CreateArrayOfCharacterSetsAvailableLC()
function. This generates
an array containing all the character sets.
// Set up file server session
RFs fileServerSession;
CleanupClosePushL(fileServerSession);
User::LeaveIfError(fileServerSession.Connect());
// Create CCnvCharacterSetConverter
CCnvCharacterSetConverter* characterSetConverter=CCnvCharacterSetConverter::NewLC();
// Create array of available character sets
CArrayFix<CCnvCharacterSetConverter::SCharacterSet>* arrayOfCharacterSetsAvailable=
characterSetConverter->CreateArrayOfCharacterSetsAvailableLC(fileServerSession);
The character sets in the array might be displayed using code similar to that below. In the fragment the loop iterates through the array elements and prints the name of each referenced character set.
_LIT(KAvailable,"Available:\n");
_LIT(KFormatting," %S\n");
Console.Printf(KAvailable);
for (TInt i=arrayOfCharacterSetsAvailable->Count()-1; i>=0; --i)
{
const CCnvCharacterSetConverter::SCharacterSet& charactersSet=
(*arrayOfCharacterSetsAvailable)[i];
characterSetConverter->PrepareToConvertToOrFromL(charactersSet.Identifier(),
*arrayOfCharacterSetsAvailable, fileServerSession);
TPtrC charactersSetName=charactersSet.Name();
Console.Printf(KFormatting, &charactersSetName);
}
The character set array is passed as an argument to
the PrepareToConvertToOrFromL()
function, along with the file
server session and the UID for the character set. In the example below it is
hard-coded as ASCII. If the character set does not exist, the function panics
the thread.
// pass array to PrepareToConvertToOrFromL()
characterSetConverter->PrepareToConvertToOrFromL(KCharacterSetIdentifierAscii,
*arrayOfCharacterSetsAvailable, fileServerSession);