Re: unicode (char as abstract data type)

Alex Belits (abelits@phobos.illtel.denver.co.us)
Fri, 17 Apr 1998 20:19:26 -0700 (PDT)


On Sat, 18 Apr 1998, NIIBE Yutaka wrote:

> Ulrich Drepper writes:
> > UTF-8 is normally meant to be the external representation of UCS2 or
> > UCS4.
>
> Yes. But it can be used for internal representation of something
> other than UCS2/UCS4. See Plan9 for internal representation.

But it can't be formally called "UTF-8" -- UTF-8 is explicitly defined
as the encoding for Unicode only.

> Or please look at our attempt for GNU Emacs using UTF-8-like encoding
> for MULE-character set:
> ftp://akebono.etl.go.jp/project/utf-2000-19980416.diff
> http://turnbull.sk.tsukuba.ac.jp/Tools/XEmacs/utf-2000.html
>
> Nowadays (escpcially for Chinese characters), I think that the system
> should not define "what the character is", instead, we need mechanism
> supporting users to define "what the character is" in ways of
> methods/library/features.
>
> Text by coded character set is not the ultimate solution. We know
> that any coded character set is optimized solution (short cut) for
> computer technology. We need something beyond coded character set to
> get real internatinalization and multilingualization.

If something is going to solve with the original problem, it most
likely will be done by providing not least-common-denominator unified
encoding but unified way of representing/labeling multiple encoding and
using their processing methods, so to handle previously unknown on one's
system charset/language one will only need to add a set of shared-library
"plugins" and fonts that will handle representation, processing and input
methods for that charset/language.

Filesystem can be just kept out of it and provide byte-transparent way
to use files and their names.

--
Alex

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu