Tuesday, May 01, 2007

Kiel oni skribas ... ? Typing Esperanto characters in Ubuntu

[update 20081102: Finally, almost three years after the bug report, Ubuntu 8.10 Intrepid Ibex has full, built-in support for Esperanto letters. This post may be considered out-of-date now. Congratulations for the gtk+ team !]

Gtk+ (and thus Ubuntu) multilingualism environment is not as good as it could be. Imagine that you can write in some exotic language. Now try to write a blog in such language using firefox in Ubuntu. You'll hit a wall and that's it.

For the impatient: add the line:

GTK_IM_MODULE=xim

In your /etc/environment file and reboot. [update 20080317: gtk will be fixed so that this is not necessary anymore, see gnome bug 321896] For the patient: read on.

Interestingly, a set of small bugs, still present in Feisty Fawn, makes me unable to write in Esperanto, an interesting constructed European language (that I learn just for fun), that uses the following special characters: ĉ, ĝ, ĥ, ĵ, ŝ and ŭ. The first, and most basic problem is the following: gtk+ could allow, with zero configuration, anybody write such characters (on a typical international keyboard). Qt based and Xlib based application allow this. If I used Kubuntu I wouldn't be writing this post, because Esperanto characters would work out of the box !

You may try this out in your Ubuntu or gnome based distribution: in firefox and gedit, type ^ + a, ^ + e (using an international keyboard). You'll have the â and ê characters. Now type ^ + c, ^ + h. Firefox and gedit will beep and won't show neither ĉ nor ĥ. Now open xterm and type ^ + a, ^ + e, ^ + c, ^ + h. You'll have âêĉĥ.

In X-Windows, complex character typing is handled by modules called input methods. X has a default API for handling input methods, called XIM. That's what xterm uses. Gtk+ provides it's own API and set of input methods for gtk+ based applications. Gtk+ input methods don't provide Esperanto characters. Why ? Simply because gtk+ developers don't want to. I have filed this bug more than one year ago. Also, I could find a similar bug in gnome database that is 4,5 years ago. Now, in your Ubuntu, install some Qt/KDE application and type âêĉĥ. Easy, ain't it ?

Fortunatelly, one of the input methods from the gtk+ set is to use the fallback XIM input method. It's the input method that says "I was unable to do what I was supposed to, so I'll let you use the standard, simpler input method because it works". In gedit, right-click in the text area, select the menu "input methods", then "X input method". Now type âêĉĥ. Voilà ! Firefox doesn't have this menu, however :'(. How one could make the X input method the default method, so that all applications work and I don't have to select it from a menu ? Here we go to our second multilingualization bug. There's no way, in gnome or gtk, to set our default input method. That's right. No way. Ni feliĉe batu en la muron, denove !

The solution is to edit by hand the configuration file given above, /etc/environment (or in your personal enviroments file, ~/.bashrc, or maybe in some other place, depending on your system). In the next boot, when applications load the new environment, XIM will take control. Finally, we have a multilingual environment for Ubuntu... but wait !

Firefox still doesn't work. Why is that ? Here we have a Ubuntu specific bug. More specifically: a 64-bit Ubuntu bug. Almost all the applications I use are compiled for 64-bit. Firefox (and a few others) is an exception. I need to browse sites that use Java and Flash software. Java has just become free software. Flash is closed source software. Neither of them offer 64-bit support for firefox. So my firefox must be 32 bits (see Ubuntu forums if you'd like to know more about this). There are a few support packages that I have to install in order to support 32 bit applications, like "ia32-libs", "ia32-libs-gtk", etc. Those applications don't correctly install the 32 bit environment. Two extra hacks are necessary for 32 bit, gtk apps, to work. The first: add the following line to your firefox32 script (mine is /usr/local/bin/firefox32 or /usr/local/firefox32/firefox):

GTK_IM_MODULE_FILE=/etc/gtk-2.0/gtk.immodules.32

This first bug means that the 32 bit gtk looks by default in gtk.immodules, instead of gtk.immodules.32. The second bug is that the base 32 bit environment doesn't set locales. You need the following command:

ln -s /usr/lib/locale /usr/lib32/locale

This should do. Notice how we pursued a single task: to write in Esperanto in Ubuntu. We've found 4 bugs. Not the way to go.

4 comments:

don said...

I also had problems typing in EO, but I managed to do so without editing anything. Granted my solution is just as annoying.

I enabled the Compose Key in the Keyboard menu, using one of the menu keys I don't use. I then enabled the Esperanto language in the language settings. Now, I don't know if this last step is actually necessary, because you still can't type in EO. But because I'm living in China at the moment, I also had the Chinese language enabled.

With Chinese comes SCIM, assuming the "enable support for complex characters" is checked. Now with SCIM running the whole time, I can type EO using my compose key. Sed kial mi povas skribi sole kiam mi uzas la ĉinan?

"compose + shift6 + c" = ĉ and that works anywhere. And not just EO, anything in UTF-8, e.g., ⓚ.

The problem with this is, if I take away the Chinese language and/or SCIM, I can no longer type in EO, despite not actually using SCIM to type.

So if they could allow the compose key to work all the time, everything would be fine. But because they only enable the extra keys when SCIM is running, that's a nuisance.

hdante said...

Hi,

Thanks for the comment. I've experimented with SCIM here.

Like XIM, SCIM is another input method available for gtk+ (actually, SCIM is extensible, so it's a whole set of input methods). When you select the Chinese language setup as default, your whole system becomes "localized" to the Chinese language, and gtk knows that.

Then gtk maps the language that you are using to a predefined input method that is supposed to work with that language. For example, my system is configured to Portuguese, so, if I don't edit my configuration by hand, gtk automatically sets the input method to "cedilla". It's as if gtk went into my environment and automatically set

GTK_IM_MODULE=cedilla.

Similarly, in your system, it defaults to

GTK_IM_MODULE=scim.

SCIM has a default input method that is very powerful, and it allows Esperanto characters.

You may use SCIM even without configuring your system to Chinese. Open gedit, right click in the text box, and select SCIM. Now type compose+^+c. You should get a ĉ (you may need to set you keyboard to standard US, or something). Whenever SCIM is being used as the input method you can do this. Similarly, whenever XIM is used as the input method you can do this. Then we go back to firefox and it doesn't work, because it doesn't have this pop-up menu, that allows you to change the input method. We need the environment variable.

Conclusion: if the input method that is mapped to your language accepts Esperanto characters you are lucky. If it is not mapped, you'll need to manually edit your environment file. The problem with gtk is that the "default" input method (which would be the input method that most people use) doesn't accept Esperanto characters (neither does "cedilla", which is the one I use). It would be very simple to fix this, but gtk people completelly ignore it.

Simon said...

A patch was submitted for GTK+ that updated the list of compose sequences to the level provided in Xorg. This means that all compose sequences with dead_cedilla are available now.

See http://simos.info/blog/archives/661
on how to apply to your system, and more importantly test the patch.

Daniel said...

A very good article!

To prevent a reset of the uptime (no ... I wantet to keep my screen-session and rebooting is the Windows-Way) if created the File

/etc/X11/Xsession.d/00x11-common_set-gtk-im

So I only had to restart X (Alt-Ctrl-BkSp).

The content is, as you described above

GTK_IM_MODULE=xim

and I added (not sure if it's necessary)

export GTK_IM_MODULE

I like to use the X Input Method because of a system-wide configuration and especially because of entries in my ~/XCompose like the following

<Multi_key> <b> <t> <w> : "by the way"

For a good description of XCompose see http://cyberborean.wordpress.com/2008/01/06/compose-key-magic/