Unicode details (no war!), the kernel, and filenames

Teunis Peters (teunis@usa.net)
Wed, 27 Aug 1997 11:16:29 -0600 (MDT)


On Thu, 21 Aug 1997, Darin Johnson wrote:

> > This is their prerogative. However BIG5 is not suitable as an
> > interchange-encoding for Linux (at least not outside China).
>
> Odd, countless Chinese speakers outside of China use it :-)
> And many of them use it on Linux.

Any info where I could find source? Specs?

[clipped by James Mastros <root@jennifer-unix.dyn.ml.org> on interface to
translation module - could work]
>
> This sounds somewhat OK. Unicode is fine for internal formats.
> But I don't more problems are solved than are introduced.
>
> One ugly part is that for a file, the contents are the responsibility
> of user space, and the file name is the responsibility of kernel space.

Umm - how difficult would it be for userspace to handle filenames?
[this may seem kinda strange until you look at a filesystem as yet another
database system... it'd be kinda fun to name a file the colour blue :]

> Another ugly part is, you don't know what encoding most FS's actually
> use. That is, if you've got a file name on ext2fs, how do you know
> how to convert it to UTF-8? Or an imported ufs disk? What if ext2fs
> has some files in one encoding, and others in a different one?

Hmm.... Standards are good [pity there's so many].. Though by and large
it's standardized on UTF-8.... supposedly... (if anyone bothers paying
attention)

> For NTFS, yes it makes sense to convert to UTF-8 and pass that on;
> because we know exactly what encoding it always uses, and we need to
> handle this in kernel space (so we don't have null chars in
> filenames). Yes, you've just solved NTFS's problem; but that could
> have been done solely inside of the NTFS handler.

And VFAT and CD-filesystems (DVD, joliet <grr>) and SMB... any other
takers?
Hey - this might even solve some of the translation problems with HFS
(Macintosh - ':' is invalid and '/' is acceptable in a filename)

I don't know who plans on using DVD disks but if there's going to be
support UTF-8 is mandatory (or was that UCD? I thought it was UTF-8.
Much same, slightly different encoding)... (either that or Linux joins
Mickysloth in inventing new standards).

G'day, eh?
- Teunis

PS : to reiterate, how difficult WOULD it be to make filenames completely
a userspace issue?