Re: Kernel & Unicode

H. Peter Anvin (hpa@transmeta.com)
27 Aug 1997 15:49:39 GMT


Followup to: <199708271013.GAA04659@lynx.dac.neu.edu>
By author: Albert Cahalan <acahalan@lynx.dac.neu.edu>
In newsgroup: linux.dev.kernel

> What about large character sets? You must use Unicode. For all the
> normal applications, the normal system calls _must_ remain _pure_
> 8-bit for the next 30 years. Sorry, UTF-8 and BIG5 both fail.

Albert, the current system calls work JUST FINE for both UTF-8 and
Big5. There is nothing broken, AT ALL, so don't fix it.

> This problem can be fixed the same way other system call problems
> get fixed: add a second set of system calls or a personality.
> Only true 16-bit Unicode can work right, and it is not at all
> compatible with the existing API.

This is simply not true.

> Sun and Microsoft both use the 2-byte encoding. It is best.
> You can send that directly into the kernel, but not via the old
> system calls. You need an open(2) that uses a 16-bit '/' for
> the path and a 16-bit '\0' for the end of a string.

If you want a wrapper to open() to convert your UCS-2 (or UCS-4)
string to UTF-8 without having to bother, write it in user space and
put it in a library (libucs2, libucs4). Easy.

-hpa

-- 
    PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD  1E DF FE 69 EE 35 BD 74
    See http://www.zytor.com/~hpa/ for web page and full PGP public key
Always looking for a few good BOsFH.  **  Linux - the OS of global cooperation
        I am Baha'i -- ask me about it or see http://www.bahai.org/