Re: JFS default behavior / UTF-8 filenames

From: kernel
Date: Thu Feb 19 2004 - 18:49:51 EST


On Thu, Feb 19, 2004 at 08:05:06AM -0600, Dave Kleikamp wrote:
> The arbitrary string of bytes is treated as the latin1 charset in that
> it is stored as 0x00nn (in UTF2), but JFS really doesn't care what the
> character set is.

While I don't really care one way or the other about the whole
"rejecting non-UTF8 filenames" thing, trying to store 8bit strings in
UTF2 (no such thing, is there? Is JFS UCS-2 or UTF-16?) seems really
ugly. In general at least, maybe it's not so bad in JFS's case
specifically because of there not being much sharing of JFS filesystems
between linux and non-linux systems.

But if JFS uses that "make the high byte zero and return the low byte
only" scheme, what does it do when it encounters a UCS-2 filename that
has a non-NUL high byte on an existing filesystem? I can't see any ways
of dealing with this that aren't much more horribly broken than merely
refusing to create filenames that aren't valid in the current encoding.
If it throws the high byte away then you've made it impossible to open
said files, and up to 256 files per character of the filename can now
appear to have the same filename.

So what does JFS do in its "throw away the high byte and store binary
character strings in the low byte" mode? How does it deal with an
existing filesystem that has filenames that don't conform to said rule?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/