Re: Improved console UTF-8 support for the Linux kernel?

From: Simos Xenitellis
Date: Sun Dec 12 2004 - 09:04:35 EST

Jan Engelhardt wrote:
> >> The current UTF-8 keyboard input (for the console) of the Linux kernel
> >> does not support "composing" or writing characters with accents.
> That's weird, because "Ã" (LATIN O WITH DIAERESIS) -- which clearly lies
> outside the 7-bit range, is working on my system without myself poking the
> kernel. Both hitting the key or using compose mode. This also applies to
> A-with-DIAERESIS, U-with-DIAERESIS, sharp german S, but does not for anything
> else, e.g. compose-'-e to generate E with accent aigu.

I am a bit confused. Could you please comment on the following, as a
common test steps?

I am not sure how you wrote the above characters. According to UTF-8,
characters with codepoints above 0x79 require two bytes so that to be
valid. When you compose "Ã" (you press something like ";", then "o") in
the console?

For simplicity, let's assume you do something like
% loadkeys --unicode
keycode 53 = 0x0d2f
compose '/' 'q' to U+00F6
compose '/' 'w' to U+00F7
compose '/' 'e' to U+00F8
compose '/' 'r' to U+00F9
compose '/' 't' to U+0100
compose '/' 'y' to U+0101
keycode 2 = U+00F6
keycode 3 = U+00F7
keycode 4 = U+00F8
keycode 5 = U+00F9
keycode 6 = U+0100
keycode 7 = U+0101

Dead key (due to "0d") is the character "/" (0x2f).
Keycodes 2-7 are keys for numbers 1-6.
To test, I type
% cat > test.txt
<we try out all key compositions to generate U+00F6-U+0101>

When we try keys 1-6, we get
% od -x text.txt
0000000 b6c3 b7c3 b8c3 b9c3 80c4 81c4 000a
which is correct.

When we try using the dead key "/" and q-y, we get
% od -x test.txt
0000000 f7f6 f9f8 0100 000a

To get the keyboard in a sane mode, "loadkeys --unicode -d".

>From here we see there is no conversion to UTF-8 whatsoever.

In the second case, the kernel cannot return the full character when it
is in Unicode mode.

> >Yes, i recently find it out when trying to switch all my system to
> >UTF-8. But the patch from Chris you mention below works very well
> >for me (and for anybody that needs to type compose characters for
> >languages based in the latin1 encoding i guess).
> >
> >> Is there an interest for re-submission of mentioned patches for
> >> inclusion in the kernel (yeah, provided coding style is "normalised")?
> >
> >At least, I am _really_ interested :)
> So am I. I have to use xterm for anything fancy now...
> (especially for the even-more fancy stuff that begins at three-byte UTF8
> sequences, such as Japanese :-)

Good. I hope more people raise their hands for this.


[I am sending this again. It did not make it to the kernel mailing list in the first^Wsecond post for some reason..]

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at