[Off-topic] Re: unicode (char as abstract data type)

Lin Zhe Min (ljm@ljm.wownet.net)
Wed, 22 Apr 1998 11:52:24 +0900 (CDT)


However the discussion is off-topic. I apologise to those who are
allergic to off-topic-message.

On Tue, 21 Apr 1998, Alex Belits wrote:

> You have no knowledge of reasons why people don't like Unicode then.
> "Homegrown" encodings are national standards that exist for decades, and
> often for a very good reasons.

We all think of some good old-time, while we use 'copy con foo.com'
or while we modify the GDT/LDT for occupying the whole i386 to our own.
Why bother think of those junk days, as you may think of Czar or the
(fake) communist party ? When technology takes advance, there should
be a change.

I'm a Chinese, and I use a terminal which can inteprete one of the
Chinese charsets (we call it interior-code in Taiwan). However, since
I need to talk to Japanese people, or Chinese people using other
charsets *at the same time*, it is hardly to write a multilingual
terminal with input methods without kernel + libc support. Note bene!
I want them to be displayed _at the same time_ and there is nowadays
solely one incompatible GNU emacs 20.x (or old GNU MULE) can do it
for me.

And I have trouble using Chinese filenames even when I modified
.inputrc. Sometimes there is a UNIX-reserved character in Chinese
Big-5 charset and I have difficulty to copy/move/... them without
proper careness. So WHY NOT CHANGE ?

Charset-tag (what you like) or Unicode is not the subject, and I
haven't enough knowledge to choose one of them. However, I know to
suffer the bitter to change to Unicode benefits:

1) It maps Chinese/Japanese/Korean kanji (Han characters) respectively.
That makes convertion easy and quick.
2) We don't need to draw duplicated characters if they're written
in the same form.
3) In full-text searching, we can profit from the mapping method, so
that we can search not only those in our native language/charset,
but also those in Japanese/Korean, written in the same letter but
looked slightly different from each other due to every nation's
word-simplification.

And all the above is not what a 'temporary' charset-tagging can do.
However, Unicode haven't considered traditional Chinese fonts (used
in Hongkong, Taiwan, and all ancient scripts) vs. simplified fonts
(use in the mainland China, aka PRC, by 1.2 billion people). It put
them both in ONE charset-platform. It's messy, and I have gotten any
idea yet if the organisation who manages Unicode wants to change that.
It's a fraw.

> > As it is many Japanese and
> > Chinese people are balking at Unicode because it uses the same
> > values for characters that appear identical in those two
> > languages but that have different meanings. That's because
> > Unicode is a _character set_; it's a code for a set of symbols
> > used in writing, and not for syntactic or semantic information.
> > If the same symbol is used in two different languages, even if it
> > has a different meaning or role, Unicode uses only one code for
> > it.
>
> This is faulty reasoning. Text has a meaning, and lost distinction
> between meanings is lost information.

Assume your "Text" means "letter", to be discussed. Unicode handles
only "letters", but not "words". (Definition: A word is like "Linux",
and it has five letters, respectively L, i, n, u, x.)

While you're using phonogram letters, no. A letter has no meaning
and it's only a phonetic symbol, which you're used to reading or
writing. An English "a" and a French "a" ARE identical, and a russian
"P" and an ukranian "P" ARE identical; though no one will like to map
russian "P" to French "R". That's meaningless.

However, we people use ideogram characters, which, containing an ancient
and original meaning but forgotten by today's youngsters, may change
it's idea while the appearence of a character changes. Fortunately there
are only people living in Han culture district (aka Kanji culture dis.)
nowadays using ideogram characters. (If I were wrong, please tell me
whoever use ideogram characters as well. No Tibetian; they're using
phonetic-symbolic letters since the ancient age, as well as the Idians.
The Vietnam Ji-Nam disappeared while French pretre went to Vietnam in
18th century.) And each character, though may be written slightly
different in each nation, has a same ancient and original meaning. NB.
"words" combined of "characters" MAY have different meaning (sometimes
ironic) among the three cultures, but that's NOT unicode should take
care of, because it handles ONLY letters and characters.

So what's wrong with the original reply ?

.e'osai ko sarji la lojban. ==> Please support the logical language.
.co'o mi'e lindjy,min. ==> Goodbye, I'm Lin Zhe Min.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu