Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Alex Belits (abelits@phobos.illtel.denver.co.us)
Mon, 25 Aug 1997 23:42:43 -0700 (PDT)


On 26 Aug 1997, Kai Henningsen wrote:

> > Of course, we could make our own encoding, say Linux-8, where bit 7 set
> > means there are 7 more bits in the next byte to look at. A pair of codes
> > could be reserved as a "terminator" and "separator", and used universally.
> > This scheme is infinitely expandable, and not limited like UTF-8. <Laugh>
> > Should the day come, we could support a multitude of alien dialects :-)
>
> Like with UTF-8, you mean? :-)

No. With any native encoding the size of character is determined by the
needs of writing system. In UTF-8 it's determined by pulled of someone's
ass Unicode characters order (but surprisingly it's always 1 for ASCII and
2 for iso8859-1).

>
> ISO 10646 has 2^31 possible characters. UTF-16 has 2^16+2^20 possible
> characters. Current plans seem to use about 2^18 of these characters for
> the forseeable future.

I can make an encoding with 2^1024 characters by assigning numbers to all
possible patterns on 32x32 pixels map. All characters that can exist
in the world will definitely be there. So what?

> UTF-8 could easily support 2^36 different characters. That's 2^18 types of
> aliens that need the same amount of characters as we do. That's quite a
> lot.

..and BASE64 the above mentioned encoding to make it suitable for
8-bit-native systems.

> Oh, perhaps a very short summary of what all these are, since some people
> seem to have trouble to keep them apart (and I hope I don't confuse things
> myself).

As opposed to Germans who explain benefits of their encodings for
Russians, Chinese, Korean, Japanese and everyone else, we already know,
what we are talking about, thank you.

--
Alex