No, it makes it a crude hack.
> With UTF-8, repeated conversions are unavoidable and somewhat complex.
> Most apps will _severely_ mishandle text on a UTF-8 system. At least
> with raw Unicode you know that newer apps will operate without
> overhead and older apps won't split UTF-8 characters. At worst you
> have a byte order swap.
Most apps will simply not operate in a UCS-2 system which you propose.
> > This is exactly the wrong thing to do. We *DON'T* want this
> > kind of crap in the system. If so, we're much better off
> > standardizing on Unicode. Otherwise the kernel has to know
> > about every bloody character set in existence -- this is
> > completely utterly intolerable.
>
> It is funny to see that from you, because I think you had something
> to do with loadable translation tables for the console. Do you also
> find that completely utterly intolerable? There are already several
> reimplementations of it for filesystems. Wouldn't it be better if
> they could share the same code and translation tables?
Sort of, but what you proposes requires the kernel to know *EVERY*
character set. Loading a map for the console is only required for the
character set you want to *use*. Unfortunately, the tables the
console requires are not adequate for filesystem use, and the inverted
tables a filesystem would require are *MUCH* larger.
> > This is the wrong thing to do. Use UTF-8 encoding as the
> > multibyte set, and do conversion to wide characters if you
> > want to.
>
> Since the conversion is not cheap and UTF-8 breaks everything
> anyway, we might as well do this the Right Way with 16-bit
> characters all accross the API. The old calls must remain
> as single-byte encoded for normal apps.
Except it isn't the right thing.
> > The Asians are -- for good reason -- already screaming bloody
> > murder over 16 bits; either we end up using an awful kluge
> > like UTF-16, or we stick to 8-bit bytes and use UTF-8, which
> > handles all of UCS-4 quite elegantly.
>
> Normal everyday "characters" fit in 16 bits. Since there are
> more characters every day, they can't all go into halfway
> portable filenames anyway. This is why word processors and HTML
> let you embed an image of <img src="foo.gif"> as needed.
Tell that to the Chinese person who can't write his name because it's
beyond 16 bits. It's a lose.
-hpa
-- PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74 See http://www.zytor.com/~hpa/ for web page and full PGP public key Always looking for a few good BOsFH. ** Linux - the OS of global cooperation I am Baha'i -- ask me about it or see http://www.bahai.org/