Chapter 8 Code set conversion

omniORB 4.0 supports full code set negotiation, used to select and translate between different character code sets, for the transmission of chars, strings, wchars and wstrings. The support is mostly transparent to application code, but there are a number of options that can be selected. This chapter covers the options, and also gives some pointers about how to implement your own code sets, in case the ones that come with omniORB are not sufficient.

8.1 Native code set

For the ORB to know how to handle strings given to it by the application, it must know what code set they are represented with, so it can properly translate them if need be. The default is ISO 8859-1 (Latin 1). A different code sets can be chosen at initialisation time with the nativeCharCodeSet parameter. The supported code sets are printed out at initialisation time if the ORB traceLevel is 15 or greater.

For most applications, the default is fine. Some applications may need to set the native char code set to UTF-8, allowing the full Unicode range to be supported in strings.

In omniORBpy, wchar and wstring are always represented by the Python Unicode type, so there is no need to select a native code set for wchar.

8.2 Code set library

To save space in the main ORB core library, most of the code set implementations are in a separate library. To load it from Python, you must import the omniORB.codesets module before calling CORBA.ORB_init().

8.3 Implementing new code sets

Code sets must currently be implemented in C++. See the omniORB for C++ documentation for details.