Chapter 8 Code set conversion
omniORB 4.0 supports full code set negotiation, used to select and
translate between different character code sets, for the transmission
of chars, strings, wchars and wstrings. The support is mostly
transparent to application code, but there are a number of options
that can be selected. This chapter covers the options, and also gives
some pointers about how to implement your own code sets, in case the
ones that come with omniORB are not sufficient.
8.1 Native code set
For the ORB to know how to handle strings given to it by the
application, it must know what code set they are represented with, so
it can properly translate them if need be. The default is ISO 8859-1
(Latin 1). A different code sets can be chosen at initialisation time
with the nativeCharCodeSet parameter. The supported code sets
are printed out at initialisation time if the ORB traceLevel is 15 or
greater.
For most applications, the default is fine. Some applications may need
to set the native char code set to UTF-8, allowing the full Unicode
range to be supported in strings.
In omniORBpy, wchar and wstring are always represented by the Python
Unicode type, so there is no need to select a native code set for
wchar.
8.2 Code set library
To save space in the main ORB core library, most of the code set
implementations are in a separate library. To load it from Python, you
must import the omniORB.codesets module before calling
CORBA.ORB_init().
8.3 Implementing new code sets
Code sets must currently be implemented in C++. See the omniORB for
C++ documentation for details.