Re: unicode (char as abstract data type)

Alex Belits (abelits@phobos.illtel.denver.co.us)
Fri, 17 Apr 1998 14:29:19 -0700 (PDT)


On Fri, 17 Apr 1998, Alan Cox wrote:

> > The problem is, for handling the data in applications UTF-8 is the very
> > worst format ever invented by a human.
>
> In what ways ?

1. While doing absolutely nothing for real language-dependent operations
on text without clumsy hacks around it, it makes things seem
"internationalized" when they are not actually usable for non-iso8859-1
languages. However it creates more trouble for a programmer than any
multibyte encoding for anything except the most trivial things.

2. It's based on Unicode, the standard, widely opposed everywhere except
English-speaking countries (whose opinion doesn't count, especially on
UTF-8 that is binary-indistinguishable from ASCII in ASCII characters
range) and Western Europe (for what Unicode is specifically accomodated).
For example, all Russian programmers (and me among them) that I have seen
or heard, consider that "standardization" as an equivanent of spitting
into their face.

3. It tries to avoid the unavoidable -- multilingual text processing
must use some kind charset _and_ _language_ labeling to do things well and
consistent with complex and diverse nature of human languages. While
labeling is obviously quite a pain in itself, it's 1. can be easily
extended, 2. can use existing localizable or localized software, 3. used
and standardized in MIME, even though in a way that needs to be extended
to be applicable for documents that contain multiple languages, 4. With
reasonable effort that does not involve modification of existing software
and configuirations can interoperate with everything that exists now if
such interoperation is possible at all, 5. Is necessary to
non-text-display-oriented processing (phonetic match, speech generation,
statistical text analysis) and even high-quality multilingual
typesetting.

Unicode just sweeps the dust under the carpet, pretending that the
problem is limited to the storage and visual representation of
multilingual text.

--
Alex

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu