Nope, it's not. It's a much better idea to make sure the right tables
KOI-8 -> Unicode -> font are loaded. The first translation of these
still need work.
> UTF-8 is not quite as bad as many native multi-byte encodings,
> but it is still really bad. It can be used to store Unicode
> filenames on a hostile network filesystem. It is also a weak
> kind of compression for systems that are 95% ASCII and 5%
> of some mixed/large language(s). Other than that, forget it.
What do you mean "still really bad"? I strongly disagree with that
statement, I think it is the preferred form for interchange.
> Next, the kernel must translate filenames. I want to put your
> KOI-8 floppy in my system and read it the right way as well
> as I can. If I convert to full Unicode, I want to read every
> filesystem I can find. This requires a mount option for every
> filesystem with a poorly defined character set.
This is exactly the wrong thing to do. We *DON'T* want this kind of
crap in the system. If so, we're much better off standardizing on
Unicode. Otherwise the kernel has to know about every bloody
character set in existence -- this is completely utterly intolerable.
> That leaves only the kernel API. The standard way of fixing
> an API will do quite well: alternate system calls for raw
> 16-bit Unicode. Only the calls that take/return 8-bit text
> need alternates. The old calls do _not_ get depreciated, at
> least not much. They need to use plain 8-bit (not multi-byte)
> text and remain that way for the next 30 years at least.
> For the new API, pick the byte order with Java and vfat in mind.
> The '/' and '\0' are safe: the kernel uses 16-bit versions.
Great; you do know that Java and VFAT use opposite byte order, right?
This is the wrong thing to do. Use UTF-8 encoding as the multibyte
set, and do conversion to wide characters if you want to. The Asians
are -- for good reason -- already screaming bloody murder over 16
bits; either we end up using an awful kluge like UTF-16, or we stick
to 8-bit bytes and use UTF-8, which handles all of UCS-4 quite
elegantly.
Backward compatibility with 8 bits and forward compatibility with > 16
bits (planes 1 and 2 in ISO 10646 are already being defined) is what
leads me to say that UTF-8 is the way to go.
-hpa
-- PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74 See http://www.zytor.com/~hpa/ for web page and full PGP public key Always looking for a few good BOsFH. ** Linux - the OS of global cooperation I am Baha'i -- ask me about it or see http://www.bahai.org/