Re: unicode (char as abstract data type)

H. Peter Anvin (hpa@transmeta.com)
18 Apr 1998 02:03:10 GMT


Followup to: <19980417212059.55001@hazel>
By author: Raul Miller <rdm@test.legislate.com>
In newsgroup: linux.dev.kernel
>
> Albert D. Cahalan <acahalan@cs.uml.edu> wrote:
> > Think about the consequences of UTF-8 at the system call level:
> > Every system call that uses text must be first converted to UTF-8.
> > This burden is with us forever. Meanwhile, Windows and MacOS can
> > avoid conversion costs after the world converts to UCS2.
>
> If there were any reason to do it, you could create a personality which
> used (for example) UTF-16 instead. With a bit of work, you could write
> a libc which could work properly with either the UTF-8 or the UTF-16
> personality (a crude one, might perhaps use a boot mechanism which
> points at a different ld.so, depending).
>

UTF-16 is an awful hack; it combines all the disadvantages of
multibyte encoding with all the disadvantages of wide characters, and
adds a few new ones for good measure. UTF-8 is much cleaner. If
people want to use UCS-2, UCS-4 or UTF-16 in applications, let them,
but let's use UTF-8 for external data representation.

> Anyways, point is: you can change the interfaces, if you decide
> it's worth the work. Right now, though, the only applications I've
> used under linux which are unicode-aware are things derived from
> plan-9, and these are perfectly happy with utf-8. (9fonts is also
> the only unicode font (utf-2) I've got).

Well, Plan 9 is if anything a good example on why UTF-8 is the right
thing.

> However, the last thing we want to do is implement utf-16 in the kernel
> because microsoft/apple says that they're going to someday.

No kidding. If anything, it should be a hint that something is broken.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
        I am Bahá'í -- ask me about it or see http://www.bahai.org/
   "To love another person is to see the face of God." -- Les Misérables

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu