Table of Contents Previous Next
Logo
The Ice Run Time in Detail : 32.24 String Conversion
Copyright © 2003-2010 ZeroC, Inc.

32.24 String Conversion

On the wire, Ice transmits all strings as Unicode strings in UTF‑8 encoding (see Chapter 37). For languages other than C++, Ice uses strings in their language-native Unicode representation and converts automatically to and from UTF‑8 for transmission, so applications can transparently use characters from non-English alphabets.
However, for C++, how strings are represented inside a process depends on which mapping is chosen for a particular string, the default mapping to std::string, or the alternative mapping to std::wstring (see Section 6.6.1) as well as the platform.1 This section explains how strings are encoded by the Ice for C++ run time, and how you can achieve automatic conver­sion of strings in their native representation to and from UTF‑8.2
By default, the Ice run time encodes strings as follows:
• Narrow strings (that is, strings mapped to std::string) are presented to the application in UTF‑8 encoding and, similarly, the application is expected to provide narrow strings in UTF‑8 encoding to the Ice run time for transmis­sion.
With this default behavior, the application code is responsible for converting between the native codeset for 8‑bit characters and UTF‑8. For example, if the native codeset is ISO Latin‑1, the application is responsible for converting between UTF‑8 and narrow (8‑bit) characters in ISO Latin‑1 encoding.
Also note that the default behavior does not require the application to do anything if it only uses characters in the ASCII range. (This is because a string containing only characters in the (7‑bit) ASCII range is also a valid UTF‑8 string.)
• Wide strings (that is, strings mapped to std::wstring) are automatically encoded as Unicode by the Ice run time as appropriate for the platform. For example, for Windows, the Ice run time converts between UTF‑8 and UTF‑16 in little-endian representation whereas, for Linux, the Ice run time converts between UTF‑8 and UTF‑32 in the endian-ness appropriate for the host CPU.
With this default behavior, wide strings are transparently converted between their on-the-wire representation and their native C++ representation as appro­priate, so application code need not do anything special. (The exception is if an application uses a non-Unicode encoding, such as Shift‑JIS, as its native wstring codeset.)

32.24.1 Installing String Converters

The default behavior of the run time can be changed by providing application-specific string converters. If you install such converters, all Slice strings will be passed to the appropriate converter when they are marshaled and unmarshaled. Therefore, the string converters allow you to convert all strings transparently into their native representation without having to insert explicit conversion calls when­ever a string crosses a Slice interface boundary.
You can install string converters on a per-communicator basis when you create a communicator by setting the stringConverter and wstringConverter members of the InitializationData structure (see Section 32.3). Any strings that use the default (std::string) mapping are passed through the specified stringConverter, and any strings that use the wide (std::wstring) mapping are passed through the specified wstringConverter.
The string converters are defined as follows:
namespace Ice {

class UTF8Buffer {
public:
    virtual Byte* getMoreBytes(size_t howMany,
                               Byte* firstUnused) = 0;
    virtual ~UTF8Buffer() {}
};

template<typename charT>
class BasicStringConverter : public IceUtil::Shared {
public:
    virtual Byte*
        toUTF8(const charT* sourceStart, const charT* sourceEnd,
               UTF8Buffer&) const = 0;

    virtual void fromUTF8(const Byte* sourceStart,
                          const Byte* sourceEnd,
                          std::basic_string<charT>& target) const;
};

typedef BasicStringConverter<char> StringConverter;
typedef IceUtil::Handle<StringConverter> StringConverterPtr;

typedef BasicStringConverter<wchar_t> WstringConverter;
typedef IceUtil::Handle<WstringConverter> WstringConverterPtr;

}
As you can see, both narrow and wide string converters are simply templates with either a narrow or a wide character (char or wchar_t) as the template param­eter.

32.24.2 Converting to UTF‑8

If you have a string converter installed, the Ice run time calls the toUTF8 func­tion whenever it needs to convert a native string into UTF‑8 representation for transmission. The sourceStart and sourceEnd pointers point at the first byte and one-beyond-the-last byte of the source string, respectively. The imple­mentation of toUTF8 must return a pointer to the first unused byte following the converted string.
Your implementation of toUTF8 must allocate the returned string by calling the getMoreBytes member function of the UTF8Buffer class that is passed as the third argument. (getMoreBytes throws a MemoryLimitException if it cannot allocate enough memory.) The firstUnused parameter must point at the first unused byte of the allocated memory region. You can make several calls to getMoreBytes to incrementally allocate memory for the converted string. If you do, getMoreBytes may relocate the buffer in memory. (If it does, it copies the part of the string that was converted so far into the new memory region.) The function returns a pointer to the first unused byte of the (possibly relocated) memory.
Conversion with toUTF8 can fail because getMoreBytes can cause the message size to exceed Ice.MessageSizeMax. In this case, you should let the MemoryLimitException thrown by getMoreBytes propagate to the caller.
Conversion can also fail because the encoding of the source string is internally incorrect. In that case, you should throw a StringConversionFailed exception from toUTF8.
After it has marshaled the returned string into an internal marshaling buffer, the Ice run time deallocates the string.

32.24.3 Converting from UTF‑8

During unmarshaling, the Ice run time calls the fromUTF8 member function on the corresponding string converter. The function converts a UTF‑8 string into its native form as a std::string. (The string into which the function must place the converted characters is passed to fromUTF8 as the target parameter.)

32.24.4 Built-In String Converters

Ice provides three string converters to cover common conversion requirements:
• UnicodeWstringConverter
This is a string converter that converts between Unicode wide strings and UTF‑8 strings. Unless you install a different string converter, this is the default converter that is used for wide strings.
• IconvStringConverter (Linux and Unix only)
This is a string converter that converts strings using the Linux and Unix iconv conversion facility (see Section 32.24.6). It can be used to convert either wide or narrow strings.
• WindowsStringConverter (Windows only)
This string converter converts between multi-byte and UTF‑8 strings and uses MultiByteToWideChar and WideCharToMultiByte for its imple­mentation.
These string converters are defined in the Ice namespace.

32.24.5 Convenience Functions

The Ice namespace provides four convenience functions that make it easy to convert strings to and from UTF‑8:
std::string
nativeToUTF8(const Ice::StringConverterPtr&, const std::string&);

std::string
nativeToUTF8(const Ice::CommunicatorPtr&, const std::string&);

std::string
UTF8ToNative(const Ice::StringConverterPtr&, const std::string&);

std::string
UTF8ToNative(const Ice::CommunicatorPtr&, const std::string&);
The overloads allow you to either use the string converter that is configured on a communicator or to explicitly pass a specific string converter that performs the conversion.

32.24.6 The iconv String Converter

For Linux and Unix platforms, Ice provides an IconvStringConverter template class that uses the iconv conversion facility to convert between the native encoding and UTF‑8. The only member function of interest is the constructor:
template<typename charT>
class IconvStringConverter
    : public Ice::BasicStringConverter<charT>
{
public:
    IconvStringConverter(const char* = nl_langinfo(CODESET));

    // ...
};
To use this string converter, you specify whether the conversion you want is for narrow or wide characters via the template argument, and you specify the corre­sponding native encoding with the constructor argument. For example, to create a converter that converts between ISO Latin‑1 and UTF‑8, you can instantiate the converter as follows:
InitializationData id;
id.stringConverter = new IconvStringConverter<char>("ISO88591");
Similarly, to convert between the internal wide character encoding and UTF‑8, you can instantiate a converter as follows:
InititializationData id;
id.stringConverter = new IconvStringConverter<wchar_t>("WCHAR_T");
The string you pass to the constructor must be one of the values returned by iconv l, which lists all the available character encodings for your machine.
Using the IconvStringConverter template makes it easy to install code converters for any available encoding without having to explicitly write (or call) conversion routines whose implementation is typically non-trivial.

32.24.7 The Ice String Converter Plug‑In

The Ice run time includes a plug‑in that supports conversion between UTF-8 and native encodings on Unix and Windows platforms. You can use this plug‑in to install converters for narrow and wide strings into the communicator of an existing program. This feature is primarily intended for use in scripting language exten­sions such as Ice for Python; if you need to use string converters in your C++ application, we recommend using the technique described in Section 32.24.1 instead.
Note that an application must be designed to operate correctly in the presence of a string converter. A string converter assumes that it converts strings in the native encoding into the UTF-8 encoding, and vice versa. An application that performs its own conversions on strings that cross a Slice interface boundary can cause encoding errors when those strings are processed by a converter.

Installing the Plug‑In

You can install the plug‑in using a configuration property like the one shown below:
Ice.Plugin.Converter=Ice:createStringConverter
    iconv=encoding[,encoding] windows=code-page
You can use any name you wish for the plug‑in; in this example, we used Converter. The first component of the property value represents the plug‑in’s entry point, which includes the abbreviated name of the shared library or DLL (Ice) and the name of a factory function (createStringConverter).
The plug‑in accepts the following arguments:
• iconv=encoding[,encoding]
This argument is optional on Unix platforms and ignored on Windows plat­forms. If specified, it defines the iconv names of the narrow string encoding and the optional wide-string encoding. If this argument is not specified, the plug‑in installs a narrow string converter that uses the default locale-depen­dent encoding.
• windows=code-page
This argument is required on Windows platforms and ignored on Unix plat­forms. The code-page value represents a code page number, such as 1252.
The plug‑in’s argument semantics are designed so that the same configuration property can be used on both Windows and Unix platforms, as shown in the following example:
Ice.Plugin.Converter=Ice:createStringConverter iconv=ISO8859-1
 windows=1252
If the configuration file containing this property is shared by programs in multiple implementation languages, you can use an alternate syntax that is loaded only by the Ice for C++ run time:
Ice.Plugin.Converter.cpp=Ice:createStringConverter iconv=ISO8859-1
 windows=1252
Refer to Appendix D for more information on the Ice.Plugin properties.

32.24.8 Dynamically Installing Custom String Converters

If the string converter plug‑in described in Section 32.24.7 does not satisfy your requirements, you can implement your own solution with help from the String­ConverterPlugin class:
namespace Ice {
class StringConverterPlugin : public Ice::Plugin {
public:

    StringConverterPlugin(const CommunicatorPtr& communicator, 
                          const StringConverterPtr&,
                          const WstringConverterPtr& = 0);

    virtual void initialize();

    virtual void destroy();
};
}
The converters are installed by the StringConverterPlugin constructor (you can supply an argument of 0 for either converter if you do not wish to install it). The initialize and destroy methods are empty, but you can subclass StringConverterPlugin and override these methods if necessary.
In order to create a string converter plug‑in, you must do the following:
• Define and export a “factory function” that returns an instance of StringConverterPlugin (see Section 32.25.1).
• Implement the converter(s) that you will pass to the StringConverterPlugin constructor, or use the ones included with Ice (see Appendix F).
• Package your code into a shared library or DLL.
To install your plug‑in, use a configuration property like the one shown below:
Ice.Plugin.MyConverterPlugin=myconverter:createConverter ...
The first component of the property value represents the plug‑in’s entry point, which includes the abbreviated name of the shared library or DLL (mycon­verter) and the name of a factory function (createConverter).
If the configuration file containing this property is shared by programs in multiple implementation languages, you can use an alternate syntax that is loaded only by the Ice for C++ run time:
Ice.Plugin.MyConverterPlugin.cpp=myconverter:createConverter ...
Refer to Appendix D for more information on the Ice.Plugin properties.

1
The explanations that follow are relevant only for C++. See Sections 32.24.7 and 32.24.8 for string conversion for other languages.

2
See the demo directory in the Ice for C++ distribution for an example of how to use string converters.


Table of Contents Previous Next
Logo