Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)

From: Robin Rosenberg
Date: Thu Feb 12 2004 - 14:42:27 EST


On Thursday 12 February 2004 20.08, you wrote:
> > There are many ways of getting things wrong. The algorithm for encoding
> > UTF-8 doesn't give you the option of encoding 65 as two bytes; any UCS-4
> > character with code 0-0x7F must result in a onand the same principle goes
> > for every other character and the unicdeo standard forbids the use of anything
> > but the shortest possible sequence.
>
> The recommended encoding algorithm forbids anything but the shortest
That algorithm is the /definition/ of UTF-8, not just an example. Sure you can actually
do it another way, but the result is uniquely defined (or else it's not UTF-8).

> Well, as long as every userspace implementation gets it correct, we'll
> be OK. Personally, I doubt they all will, especially those that
> convert from legacy encodings to Unicode, although quite possibly the
> above scenario with combining characters is not likely to happen for
> filenames. Or is it? What about copying a file from a filesystem
> with a UTF-8 encoding to a filesystem with a legacy encoding, and then
> back again?

Sounds like you think we want to invent a new problem. The problem is
here and it's real (not in the U.S, but the the rest of the world). There are
Network file systems (samba in particular), partitions belonging to other
OS's (ntfs, fat or even other Linux installation on the same machine),
removable devices etc etc.

Microsoft introduced a kludge for managing long file names in a short filename
context. Since Linux doesn't have the length limit a nicer kludge could be used
to represent unicode as non-unicode in userspace like a Uxxxxx. When there is
a mismatch there has to be kludge, but it's still many times better than a bunch
of character that look like garbage (and cause legacy application so choke).

-- robin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/