> Really, you can not determine charset from language - because language
> of *my* emails is sometimes something between czech and english. And
> now imagine, me wanting to write russian word sabaka (or how is dog
> written). I of course want to write it in azbuka. And I do not want to
> tell my text editor origin of each word I use.
If it will attach labels to character sequences, it will know them.
English, being a language, supported as ASCII subset in non-ASCII charset
can be used without separate labeling, but again, if necessary, switch
between languages can be reflected in labeling.
> So it is hard to impossible to gather info about language. User just
> will not want to tell you. User may even want to write Geek with 'G'
> in azbuka. Why not?
Information about language is always lost if there is no place to put a
label. Like, in Unicode.
> I believe that language labeling cannot handle
> that.
Why? One can make labels as detailed about the language as necessary --
labeling assumes the extensibility of the labels set, and matching will
still work because label contains charset name. See MIME for example of
labeling (not for example of 7-bit encoding, it's unnecessary).
-- Alex
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu