Re: unicode (char as abstract data type)

Alex Belits (abelits@phobos.illtel.denver.co.us)
Fri, 17 Apr 1998 23:12:45 -0700 (PDT)


On Fri, 17 Apr 1998, Albert D. Cahalan wrote:

> > I can _not_ ignore it if it's there. As some discussion in IETF FTP-WG
> > demonstrated, in some cases (such as FTP directory) the only way to handle
> > unknown charset at the other end of the wire is to asume something about
>
> Since when did we put FTP service into the kernel?

FTP uses kernel system calls last time I have checked. And it doesn't
transfer charset information from remote end. And has no means for
that. However NFS (quite kernel-related thing) doesn't know anything about
libc, and its combination with FTP will produce something definitely
horrible if any non-Unicode charset will be used.

> You really can ignore UCS2 at the kernel interface, because libc
> can hide it from you. You may continue to live in a KOI-8 world.

What if I use multiple charsets and don't want kernel to meddle? Or I
use database?

> This could even help you: if you ever need to mount an SMB share,
> you can let the UCS2 --> KOI-8 conversion happen in libc instead
> of in the kernel.

Great. Then everything that comes through that application will be
assumed to be in koi8? How will I keep consistency between filenames,
stored in text files and ones on the filesystem, if boxes with multiple
charsets will access the same file. Say, it's Makefile, and it mentions
something in local charset. I use it from another user's account, and it
will look for files that aren't on the server because they translate to
Unicode from byte sequence in Makefile is it's assumed to be in his
charset, differently? No, thanks.

> With the Unix extensions, SMB over TCP/IP could
> become quite popular with Linux users.

I don't know, never used such setup.

> > I'm not even trying to go into details of handling memory-mapped
> > text files in newsservers and other cases where charset information
> > is definitely present (MIME), accessible through multiple protocols
> > (NNTP, HTTP,
>
> I don't see those in the kernel.

Exactly -- they aren't, and will translate on the server side, using
server's encoding...

> > local filesystem,

...this one has no client and server, and reader will use its charset...

> >AFS and CODA

...and these ones are in kernel, so they won't translate on the server,
and client side will translate using client's encodings.

> If you mount a filesystem, you must know the encoding.

Are we in Windows? root mounts a filesystem, and his language is
undefined. Multiple users with different *languages* may use that
filesystem, and sometimes users with incompatible charsets will have to
open the same files. I don't see it as a big problem if "make" output will
contain filenames that I can't read (or pronounce, or understand), but I
definitely want them to be consistent when it creates a command line.

--
Alex

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu