Re: Unicode status

Alan Cox (alan@lxorguk.ukuu.org.uk)
Sat, 30 Nov 1996 20:07:17 +0000 (GMT)


> thought of a stable/robust solution. As far as I know switching
> to K_UNICODE is risky. Nobody handles ^C, ^Z ... and if ^C would
> be processed the user would end up with K_UNICODE in his shell. And
> not everybody has a network connection to escape from this. (Yes
> I know there is/are a/some trick(s) to get out of this.)

K_UNICODE shouldnt be removing the control code handling in the tty layer.
There is definitely an issue in that you can't for example set your
escape character to a multibyte sequence. Unfortunately the POSIX.1 folk
didnt consider that issue.

> >Its not hard, and the UTF-8 encoding used needs no extra magic to handle it.
> >You can even use it in file names.
>
> But 90% of all programs for Linux are displaying either a ? or \xxx instad.
> They often don't even show characters that are in Latin1 encoding.

A lot of tools are designed for the existing world. They do need some
gentle coaxing to know you are using a UTF-8 aware link. From a kernel
point of view its mostly there. There is a GNU internationalisation project,
although this is mostly stuck in the european 8bit left-to-right rendered
world. Perhaps a display method environment variable is called for. Teaching
ls etc not to mishandle 8bit chars is then fairly simple.

Theres a project there for someone

Alan