Re: UTF-8, OSTA-UDF [why?], Unicode, and miscellaneous gibberish

Dan Hollis (goemon@sasami.anime.net)
Wed, 20 Aug 1997 11:30:31 -0700 (PDT)


On Tue, 19 Aug 1997, Teunis Peters wrote:
> On Tue, 19 Aug 1997, Alex Belits wrote:
> > On Mon, 18 Aug 1997, Teunis Peters wrote:
> > > Beyond that the Chinese still (AFAIK) decided whether or not to actually
> > > USE unicode [the language has other ways of creating new characters - this
> > > is not something computers are good at handling], Unicode has largely been
> > > accepted [mostly by fiat].
> > AFAIK, Chinese, Japanese and Russians _oppose_ Unicode that is mostly
> > pushed by people who use iso8859-1 anyway, and thus have trivial mapping
> > between their native charset and Unicode.
> I don't know what the Japanese or Russian reasons are for opposition but
> the Chinese have REALLY good reasons for disliking Unicode....

The problem is that lots of the decisions on asian encodings in Unicode
were made by non-asians. Many of the decisions on character unifications
are non-sensical.

Basically Japanese look at Unicode as something forced upon them by
idiots who only understand single byte encodings and european languages
with latin based character sets.

>From what I understand the Russians hate Unicode because their nice simple
single-byte KOI8 encoding became mangled into double bytes. And it's not
even sorted in Russian alphabetic order!

How would americans like it if the ASCII set were turned into double byte
encodings, and made into a scrambled mess?

> FWIW - 3 standard encodings in Japanese.... And no way to tell the
> difference. For me anyways Unicode-2.0 [ISO-whatever actually] makes life
> MUCH easier.... I can't really afford to try and hunt down all the
> miriads of encodings anyways.

One of the problems with unicode is it encourages people to do _really
stupid_ things. Like what's happening with the FTP-WG. Dumb europeans
making idiotic assumptions about asian encodings. "If we force everything
to assume Unicode, then asian languages will only be mis-decoded 10% of
the time. So there's no problem."

> AFAIK Chinese has about 4-5 encodings, not counting countries that
> incorporate other character sets as well (Korean, Japanese <somewhat>, and
> so on)

Chinese primarily use BIG5 encoding. At least, they do on the web :-)

> Don't say [cut top 8 bits] or [just leave it].... Both solutions mess
> things up. I vote UTF-8 translation.

What's wrong with EUC :-)

-Dan