Re: unicode (char as abstract data type)

Alex Belits (abelits@phobos.illtel.denver.co.us)
Fri, 17 Apr 1998 20:03:36 -0700 (PDT)


On Fri, 17 Apr 1998, Albert D. Cahalan wrote:

> Microsoft already did. Library support lets app developers choose
> to use or ignore that support as they desire. As time passes, more
> will choose to use the support.

I can _not_ ignore it if it's there. As some discussion in IETF FTP-WG
demonstrated, in some cases (such as FTP directory) the only way to handle
unknown charset at the other end of the wire is to asume something about
it, unless the protocol supports charset labeling (what FTP, of course,
doesn't). Their conclusion: everybody must immediately abandon all
national encodings (that now are used over FTP despite 8-bit
cheracters in filenames being disallowed in the standard) and get new FTP
servers and clients that will use UTF-8 for everything, or else.

The need of charset and language labeling at the application level
doesn't make Unicode in any way preferrable over national charsets, and
unless one will have reliable way to propagate that labeling to the libc
functions (even in the case of data transfer across the network), there
will be no way to ensure plain possibility of lossless round-trip in the
case of protocols combinations.

I'm not even trying to go into details of handling memory-mapped
text files in newsservers and other cases where charset information is
definitely present (MIME), accessible through multiple protocols (NNTP,
HTTP, local filesystem, AFS and CODA), however there is no way to reliably
propagate charset down without a huge overhead.

> You can hate Microsoft _and_ steal their best ideas.
> (some of which were stolen from VMS, the MacOS, etc.)

There are better ideas than this even among Microsoft's ones.

--
Alex

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu