Re: GGI, EGCS/PGCC, Kernel source

Martin von Loewis (martin@mira.isdn.cs.tu-berlin.de)
Fri, 27 Feb 1998 10:00:21 +0100


> I don't think this is technically correct, is it? UTF8 is an *encoding*
> for unicode; it covers the same symbols while retaining compatibility with
> standard ASCII (special prefixes flag long characters).

Correct.

> Or am I wrong, is Unicode16 a subset of the unicode char set?

16-bit Unicode is not a subset of Unicode, it is a subset of ISO
10646, which is the international standard defining Unicode. This
standard assumes a 32 bit character set, and defines a number of
encodings. One is UTF-8, another one is UCS-2, which is a 16-bit
encoding capable of representing the Basic Multilingual Plane (BMP,
aka Unicode). OTOH, UTF-8 covers the entire 10646 characters.

Please note that there are currently only few characters outside the
BMP defined. I understand that the Chinese standards body is working
on filling planes 2 and 3, with characters identifying names of people
and places.

As for 'covers more symbol': UTF-16 is an interesting 16bit encoding,
which is 16 bits wide and still covers the first 17 planes.

Regards,
Martin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu