On the wire, Ice transmits all strings as Unicode strings in UTF‑8 encoding (see
Chapter 34). For languages other than C++, Ice uses strings in their language-native Unicode representation and converts automatically to and from UTF‑8 for transmission, so applications can transparently use characters from non-English alphabets.
However, for C++, the how strings are represented inside a process depends on which mapping is chosen for a particular string, the default mapping to
std::string, or the alternative mapping to
std::wstring (see
Section 6.6.1) as well as the platform.
1 This section explains how strings are encoded by the Ice for C++ run time, and how you can achieve automatic conversion of strings in their native representation to and from UTF‑8.
2
•
Narrow strings (that is, strings mapped to std::string) are presented to the application in UTF‑8 encoding and, similarly, the application is expected to provide narrow strings in UTF‑8 encoding to the Ice run time for transmission.
•
Wide strings (that is, strings mapped to std::wstring) are automatically encoded as Unicode by the Ice run time as appropriate for the platform. For example, for Windows, the Ice run time converts between UTF‑8 and UTF‑16 in little-endian representation whereas, for Linux, the Ice run time converts between UTF‑8 and UTF‑32 in the endian-ness appropriate for the host CPU.
The default behavior of the run time can be changed by providing application-specific string converters. If you install such converters, all Slice strings will be passed to the appropriate converter when they are marshaled and unmarshaled. Therefore, the string converters allow you to convert all strings transparently into their native representation without having to insert explicit conversion calls whenever a string crosses a Slice interface boundary.
You can install string converters on a per-communicator basis when you create a communicator by setting the
stringConverter and
wstringConverter members of the
InitializationData structure (see
Section 28.3). Any strings that use the default (
std::string) mapping are passed through the specified
stringConverter, and any strings that use the wide (
std::wstring) mapping are passed through the specified
wstringConverter.
namespace Ice {
class UTF8Buffer {
public:
virtual Byte* getMoreBytes(size_t howMany,
Byte* firstUnused) = 0;
virtual ~UTF8Buffer() {}
};
template<typename charT>
class BasicStringConverter : public IceUtil::Shared {
public:
virtual Byte*
toUTF8(const charT* sourceStart, const charT* sourceEnd,
UTF8Buffer&) const = 0;
virtual void fromUTF8(const Byte* sourceStart,
const Byte* sourceEnd,
std::basic_string<charT>& target) const;
};
typedef BasicStringConverter<char> StringConverter;
typedef IceUtil::Handle<StringConverter> StringConverterPtr;
typedef BasicStringConverter<wchar_t> WstringConverter;
typedef IceUtil::Handle<WstringConverter> WstringConverterPtr;
}
As you can see, both narrow and wide string converters are simply templates with either a narrow or a wide character (
char or
wchar_t) as the template parameter.
If you have a string converter installed, the Ice run time calls the toUTF8 function whenever it needs to convert a native string into UTF‑8 representation for transmission. The
sourceStart and
sourceEnd pointers point at the first byte and one-beyond-the-last byte of the source string, respectively. The implementation of
toUTF8 must return a pointer to the first unused byte following the converted string.
Your implementation of toUTF8 must allocate the returned string by calling the
getMoreBytes member function of the
UTF8Buffer class that is passed as the third argument. (
getMoreBytes throws a
MemoryLimitException if it cannot allocate enough memory.) The
firstUnused parameter must point at the first unused byte of the allocated memory region. You can make several calls to
getMoreBytes to incrementally allocate memory for the converted string. If you do,
getMoreBytes may relocate the buffer in memory. (If it does, it copies the part of the string that was converted so far into the new memory region.) The function returns a pointer to the first unused byte of the (possibly relocated) memory.
Conversion with toUTF8 can fail because
getMoreBytes can cause the message size to exceed
Ice.MessageSizeMax. In this case, you should let the
MemoryLimitException thrown by
getMoreBytes propagate to the caller.
Conversion can also fail because the encoding of the source string is internally incorrect. In that case, you should throw a
StringConversionFailed exception from
toUTF8.
During unmarshaling, the Ice run time calls the fromUTF8 member function on the corresponding string converter. The function converts a UTF‑8 string into its native form as a
std::string. (The string into which the function must place the converted characters is passed to
fromUTF8 as the
target parameter.)
28.23.5 The iconv String Converter
For Linux and Unix platforms, Ice provides an IconvStringConverter template class that uses the
iconv conversion facility to convert between the native encoding and UTF‑8. The only member function of interest is the constructor:
template<typename charT>
class IconvStringConverter
: public Ice::BasicStringConverter<charT>
{
public:
IconvStringConverter(const char* = nl_langinfo(CODESET));
// ...
};
To use this string converter, you specify whether the conversion you want is for narrow or wide characters via the template argument, and you specify the corresponding native encoding with the constructor argument. For example, to create a converter that converts between ISO Latin‑1 and UTF‑8, you can instantiate the converter as follows:
The string you pass to the constructor must be one of the values returned by iconv ‑l, which lists all the available character encodings for your machine.
Using the IconvStringConverter template makes it easy to install code converters for any available encoding without having to explicitly write (or call) conversion routines whose implementation is typically non-trivial.
The Ice run time includes a plugin that supports conversion between UTF-8 and native encodings on Unix and Windows platforms. You can use this plugin to install converters for narrow and wide strings into the communicator of an existing program. This feature is primarily intended for use in scripting language extensions such as Ice for Python; if you need to use string converters in your C++ application, we recommend using the technique described in
Section 28.23.1 instead.
Note that an application must be designed to operate correctly in the presence of a string converter. A string converter assumes that it converts strings in the native encoding into the UTF-8 encoding, and vice versa. An application that performs its own conversions on strings that cross a Slice interface boundary can cause encoding errors when those strings are processed by a converter.
You can use any name you wish for the plugin; in this example, we used Converter. The first component of the property value represents the plugin’s entry point, which includes the abbreviated name of the shared library or DLL (
Ice) and the name of a factory function (
createStringConverter).
The plugin’s argument semantics are designed so that the same configuration property can be used on both Windows and Unix platforms, as shown in the following example:
If the configuration file containing this property is shared by programs in multiple implementation languages, you can use an alternate syntax that is loaded only by the Ice for C++ run time:
Refer to Appendix C for more information on the
Ice.Plugin properties.
If the string converter plugin described in Section 28.23.6 does not satisfy your requirements, you can implement your own solution with help from the
StringConverterPlugin class:
namespace Ice {
class StringConverterPlugin : public Ice::Plugin {
public:
StringConverterPlugin(const CommunicatorPtr& communicator,
const StringConverterPtr&,
const WstringConverterPtr& = 0);
virtual void initialize();
virtual void destroy();
};
}
The converters are installed by the StringConverterPlugin constructor (you can supply an argument of
0 for either converter if you do not wish to install it). The
initialize and
destroy methods are empty, but you can subclass
StringConverterPlugin and override these methods if necessary.
The first component of the property value represents the plugin’s entry point, which includes the abbreviated name of the shared library or DLL (
myconverter) and the name of a factory function (
createConverter).
If the configuration file containing this property is shared by programs in multiple implementation languages, you can use an alternate syntax that is loaded only by the Ice for C++ run time:
Refer to Appendix C for more information on the
Ice.Plugin properties.