Re: [OFFTOPIC] Re: unicode (char as abstract data type)

Alex Belits (abelits@phobos.illtel.denver.co.us)
Wed, 22 Apr 1998 03:31:01 -0700 (PDT)


On Tue, 21 Apr 1998, Pavel Machek wrote:

> Really, you can not determine charset from language - because language
> of *my* emails is sometimes something between czech and english. And
> now imagine, me wanting to write russian word sabaka (or how is dog
> written). I of course want to write it in azbuka. And I do not want to
> tell my text editor origin of each word I use.

If it will attach labels to character sequences, it will know them.
English, being a language, supported as ASCII subset in non-ASCII charset
can be used without separate labeling, but again, if necessary, switch
between languages can be reflected in labeling.

> So it is hard to impossible to gather info about language. User just
> will not want to tell you. User may even want to write Geek with 'G'
> in azbuka. Why not?

Information about language is always lost if there is no place to put a
label. Like, in Unicode.

> I believe that language labeling cannot handle
> that.

Why? One can make labels as detailed about the language as necessary --
labeling assumes the extensibility of the labels set, and matching will
still work because label contains charset name. See MIME for example of
labeling (not for example of 7-bit encoding, it's unnecessary).

--
Alex

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu