Re: Improved console UTF-8 support for the Linux kernel?
From: Simos Xenitellis
Date: Sun Dec 12 2004 - 09:04:35 EST
Jan Engelhardt wrote:
> >> The current UTF-8 keyboard input (for the console) of the Linux kernel
> >> does not support "composing" or writing characters with accents.
> That's weird, because "Ã" (LATIN O WITH DIAERESIS) -- which clearly lies
> outside the 7-bit range, is working on my system without myself poking the
> kernel. Both hitting the key or using compose mode. This also applies to
> A-with-DIAERESIS, U-with-DIAERESIS, sharp german S, but does not for anything
> else, e.g. compose-'-e to generate E with accent aigu.
I am a bit confused. Could you please comment on the following, as a
common test steps?
I am not sure how you wrote the above characters. According to UTF-8,
characters with codepoints above 0x79 require two bytes so that to be
valid. When you compose "Ã" (you press something like ";", then "o") in
For simplicity, let's assume you do something like
% loadkeys --unicode
keycode 53 = 0x0d2f
compose '/' 'q' to U+00F6
compose '/' 'w' to U+00F7
compose '/' 'e' to U+00F8
compose '/' 'r' to U+00F9
compose '/' 't' to U+0100
compose '/' 'y' to U+0101
keycode 2 = U+00F6
keycode 3 = U+00F7
keycode 4 = U+00F8
keycode 5 = U+00F9
keycode 6 = U+0100
keycode 7 = U+0101
Dead key (due to "0d") is the character "/" (0x2f).
Keycodes 2-7 are keys for numbers 1-6.
To test, I type
% cat > test.txt
<we try out all key compositions to generate U+00F6-U+0101>
When we try keys 1-6, we get
% od -x text.txt
0000000 b6c3 b7c3 b8c3 b9c3 80c4 81c4 000a
which is correct.
When we try using the dead key "/" and q-y, we get
% od -x test.txt
0000000 f7f6 f9f8 0100 000a
To get the keyboard in a sane mode, "loadkeys --unicode -d".
>From here we see there is no conversion to UTF-8 whatsoever.
In the second case, the kernel cannot return the full character when it
is in Unicode mode.
> >Yes, i recently find it out when trying to switch all my system to
> >UTF-8. But the patch from Chris you mention below works very well
> >for me (and for anybody that needs to type compose characters for
> >languages based in the latin1 encoding i guess).
> >> Is there an interest for re-submission of mentioned patches for
> >> inclusion in the kernel (yeah, provided coding style is "normalised")?
> >At least, I am _really_ interested :)
> So am I. I have to use xterm for anything fancy now...
> (especially for the even-more fancy stuff that begins at three-byte UTF8
> sequences, such as Japanese :-)
Good. I hope more people raise their hands for this.
[I am sending this again. It did not make it to the kernel mailing list in the first^Wsecond post for some reason..]
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/