[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]

Introduction to i18n
Chapter 8 - Input from Keyboard


it is obvious that a text editor needs ability to input text from keyboard, otherwise the text editor is entirely useless. Similarly, an internationalized text editor needs ability to input characters used for various languages. Other softwares such as shells, libraries such as readline, environments such as consoles and X terminal emulators, script languages such as perl, tcl/tk, python, and ruby, and application softwares such as word processors, draw and paints, file managers such as Midnight Commander, web browsers, mailers, and so on also need ability to input internationalized text. Otherwise these softwares are entirely useless.

There are various languages in the world. Thus, proper input methods vary from languages to languages.

Different technologies are used for these languages. The aim of this chapter is to introduce technologies for them.


8.1 Non-X Softwares

Ideally, it is a responsibility for console and X terminal emulators to supply an input method. This situation is already achieved for simple languages which don't need complicated input methods. Thus, non-X softwares don't need to care about input methods.

There are a few Debian packages for consoles and X terminal emulators which supply input methods for particular languages.

xiterm in xiterm+thai package
Thai characters
hanterm
Korean Hangul
cxtermb5 in cxterm-big5 package
Big5 traditional Chinese ideograms
cce
CN-GB simplified Chinese ideograms

And more, there are a few softwares which supply input methods for existing console environment.

skkfep
Japanese (needs SKK as a conversion engine)
uum
Japanese (needs Wnn as a conversion engine; not avaliable as a Debian package)
canuum
Japanese (needs Canna as a conversion engine; not avaliable as a Debian package)

However, since input methods for complex languages have not been available historically, a few non-X softwares have been developed with input methods.

jvim-canna
A text editor which can input Japanese (needs Canna as a conversion engine.)
jed-canna
A text editor which can input Japanese (needs Canna as a conversion engine.)
nvi-m17n-canna
A text editor which can input Japanese (needs Canna as a conversion engine.)

You have to take care of the differences between number of characters, columns, and bytes. For example, you can find immediately that bash cannot handle UTF-8 input properly when you invoke bash on UTF-8 Xterm and push BackSpace key. This is because readline always erase one column on the screen and one byte in the internal buffer for one stroke of 'BackSpace' key. To solve this problem, wide character should be used for internal processing. One stroke of 'BackSpace' should erase wcwidth() columns on the screen and one wchar_t unit in the internal buffer.


8.2 X Softwares

X11R5 is the first internationalized version of X Window System. However, X11R5 supplied two sample implements of international text input. They are Xsi and Ximp. Existence of two different protocols was an annoying situation. However, X11R6 determined XIM, a new protocol for internationalized text input, as the standard. Internationalized X softwares should support text input using XIM.

They are designed using server-client model. The client calls the server when necessary. The server supplies conversion from key stroke to internationalized text.

Kinput and kinput2 are protocols for Japanese text input, which existed before X11R5. Some softwares such as kterm and so on supports kinput2 protocol. kinput2 is the server software. Since the current version of kinput2 supports XIM protocol, you don't need to support kinput protocol.


8.2.1 Developing XIM clients

***** Not written yet *****

Development of XIM client is a bit complicated. You can read source code for rxvt and xedit to study.

Programming for Japanse characters input is a good introduction to XIM programming.


8.2.2 Examples of XIM softwares

The following are examples of softwares which can work as XIM clients.

The following are examples of softwares which can work as XIM servers.


8.2.3 Using XIM softwares

Here I will explain how to use XIM input with Debian system. This will help developers and package maintainers who want to test XIM facility of their softwares. Debian Woody or later systems are assumed.

At first, locale database has to be prepared. Uncomment ja_JP.EUC-JP EUC-JP, ko_KR.EUC-KR EUC-KR, zh_CN.GB2312, and zh_TW BIG5 lines in /etc/locale.gen and invoke /usr/sbin/locale-gen. This will prepare locale database under /usr/share/locale/. For systems other than Debian Woody or later, please take the valid procedure for these systems to prepare locale database.

Basic Chinese, Japanese, and Korean X fonts are included in xfonts-base package for Debian Woody and later.

XIM server must be installed. For Japanese, kinput2 or skkinput packages are available. kinput2 supports Japanese input engines of Canna and FreeWnn and skkinput supports SKK. For Korean, ami is available. For traditional Chinese and simplified Chinese, xcin is available.

Of course you need an XIM client software. xedit in xbase-clients package is an example of XIM client.

Then, login as a non-root user. Environment variables of LC_ALL (or LANG) and XMODIFIERS must be set as following.

Then invoke the XIM server. Just invoke it with background mode (with &). kinput2 and ami don't open a new window while xcin opens a new window and outputs some messages.

Then invoke the XIM client. Focus on an input area of the software. Hit Shift-Space or Control-Space and type something. Did some strange characters appear? This document is too brief to explain how to input valid CJK characters and sentences with these XIM servers. Please consult documents of XIM servers.


8.3 Emacsen

GNU Emacs and XEmacs take an entirely different model for international input.

They supply all input methods for various languages. Instead of relying on console or XIM, they use these input methods. These input methods can be selected by M-x set-input-method command. The selected input method can be switched on and off by M-x toggle-input-method command.

GNU Emacs supplies input methods for British, Catalan, Chinese (array30, 4corner, b5-quick, cns-quick, cns-tsangchi, ctlau, ctlaub, ecdict, etzy, punct, punct-b5, py, py-b5, py-punct, py-punct-b5, qj, qj-b5, sw, tonepy, ziranma, zozy), Czech, Danish, Devanagari, Esperanto, Ethiopic, Finnish, French, German, Greek, Hebrew, Icelandic, IPA, Irish, Italian, Japanese (egg-wnn, skk), Korean (hangul, hangul3, hanja, hanja3), Lao, Norwegian, Portuguese, Romanian, Scandinavian, Slovak, Spanish, Swedish, Thai, Tibetan, Turkish, Vietnamese, Latin-{1,2,3,4,5}, Cyrillic (beylorussian, jcuken, jis-russian, macedonian, serbian, transit, transit-bulgarian, ulrainian, yawerty), and so on.


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]

Introduction to i18n


17 June 2006

Tomohiro KUBOTA [email protected]