Re: JFS default behavior

From: Linus Torvalds
Date: Sat Feb 14 2004 - 21:42:41 EST




On Sun, 15 Feb 2004, Robin Rosenberg wrote:
> >
> > Bullshit. It has _nothing_ to characters, wide or not. For system filenames
> > are opaque. The only things that have special meanings are:
> > octet 0x2f ('/') splits the pathname into components
> > "." as a component has a special meaning
> > ".." as a component has a special meaning.
> > That's it. The rest is never interpreted by the kernel.
>
> I know how it is (to some degree), and its wrong. The user sees inside the filename
> and sees a string of characters, not a byte sequence.

Yes, the user sees a string of characters, but the octet 0x2f ('/') and
the terminating NUL character '\0' are still perfectly normal characters
and there is no confusion.

The reason: UTF-8. It's the only sane encoding (apart from a pure extended
ASCII setup, which is also sane, but is obviously unacceptable for a large
portion of the world).

If some misguided person has told you about UCS-2 and horrors like UTF-9,
just ignore them. They are crazy and deluded, and - perhaps more
importantly - stupid.

In short: the kernel talks bytestreams, and that implies that if you want
to talk to the kernel, you HAVE TO USE UTF-8.

At which point there are no locale issues any more. The only locale issue
you can have is user space mistaking a stream of bytes as extended ASCII,
which will cause all your pretty UTF-8 characters to be shown as strange
latin1 (or other) squiggles.

> It seems you simply don't want to understand the problem, which is that users
> CAN HAVE DIFFERENT LOCALES on the same system and on different system.
> Sigh...

People understand the problem. And UTF-8 is the solution.

It's getting there. I think even Microsoft has seen the light, and is
phasing out their crapola (UCS-2LE? Whatever).

> I less concerned with which solution than that a solution should be found. So it
> seems no file system has a solution today. Still an iocharset option would relieve
> the problem for removable media and muli-boot systems.

No. Things like "iocharset" are not the solution. They are literally the
_problem_. The solution is to use something that not only acts as ASCII,
but also has a wide enough range to cover the whole required space (UCS-2
fails _both_ of these fundamental tests). At which point "iocharset" makes
no sense any more, and only exists as a way to translate legacy crap into
the one true format.

And that one true format is UTF-8. End of story. If you try to talk to the
kernel in UCS-2 or anything else, you _will_ fail.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/