Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Andrew E. Mileski (aem@netcom.ca)
Tue, 26 Aug 1997 23:09:02 -0400 (EDT)


> It will be fine if those issues will be handled in userspace. But kernel
> should be charset-neutral except in handling devices and filesystems with
> hardcoded unchangeable charset. Mandatory userspace translation of
> charsets for single-charset kernel is unacceptable in situations where
> kernel can be just transparent.

There is only one real problem in all the kernel - the console.

The filesystems (even ISO9660 level 1 which probably has the smallest
charset), could all get along in a multi-byte environment. It wouldn't
be portable of course - that can't be helped. As long as a charset
translation is reversible, nothing else really matters.

We could even specify a multi-byte separator (instead of '/') and
terminator (instead of 0x00) by using an encoding like UTF-8 does,
but Unicode doesn't have to be the charset used - it could be anything
even Klingon, though you lose charset portability.

The console is a problem because it has a fixed representation
that cannot be mucked with. Example: a space has to look the
same in all charsets, but may have different charset byte values!
The console charset is also locale specific.

AFAIK, it is impossible to have a charset used by the entire kernel,
that is not specific to the locale, unless translation is provided
for the console.

--
Andrew E. Mileski   mailto:aem@netcom.ca