Re: unicode (char as abstract data type)

NIIBE Yutaka (gniibe@mri.co.jp)
Sat, 18 Apr 1998 12:00:31 +0900


Ulrich Drepper writes:
> UTF-8 is normally meant to be the external representation of UCS2 or
> UCS4.

Yes. But it can be used for internal representation of something
other than UCS2/UCS4. See Plan9 for internal representation.

Or please look at our attempt for GNU Emacs using UTF-8-like encoding
for MULE-character set:
ftp://akebono.etl.go.jp/project/utf-2000-19980416.diff
http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html

Nowadays (escpcially for Chinese characters), I think that the system
should not define "what the character is", instead, we need mechanism
supporting users to define "what the character is" in ways of
methods/library/features.

Text by coded character set is not the ultimate solution. We know
that any coded character set is optimized solution (short cut) for
computer technology. We need something beyond coded character set to
get real internatinalization and multilingualization.

At least, for Japanese, we realy need something beyond coded character
set. Currently in Japan, novelists, (Chinese/Japanese...) language
researchers, printing industries, computer scientists, computer
engineers fight a battle each other for coded character set. We can
see articles about "MOJI-code problem" in major news papers (Yomiuri,
Asahi, Nikkei) in these days. MOJI means character in Japanese.

Thanks,

-- 
NIIBE Yutaka

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu