Re: confusion and case problems: utf8 <-> iocharset
From: Andrey Borzenkov
Date: Thu Jul 13 2006 - 12:24:06 EST
Eduard Bloch wrote:
> Hello (to whom it may concern),
>
> I try to understand how the charset mapping with VFAT/Joliet and I found
> some inconsistencies between the user expectations, the docs, and the
> actuall behaviour.
>
> Users view:
>
> VFAT, NTFS and Joliet use a Unicode charset for storing the names
> internaly.
Nope. VFAT is using short name as long as it complies with MSDOS; this name
is stored in codepage character set.
[...]
>
> Second:
> there is the "utf8" option. How does that exactly differ from
> iocharset=utf8? There is not clear explanation in vfat.txt. What happens
> if you use both options, especially if iocharset!=utf8? Which one is
> prefered?
>
You actually need both; utf8 just says it is OK not to try to mangle names;
it does not substitute iocharset option.
> Third:
> how can I disable all that funny letter case conversions? They are not
> described anywhere properly,
man mount
> nor the way to disable them.
man mount
> IMO there are
> two problems:
>
> - what you write to the FS is not the same what "ls" shows you later.
> Eg. ABW becomes "abw" but "ABWÖ" becomes "ABWÖ". Abcd becomes "Abcd"
> but "ABC" becomes "abc". Does it make sense? NO.
tell this to Microsoft. It is how VFAT works. Although umlaut should
probably be considered as MSDOS-safe, at least with proper codepage option.
> I would like to stop the kernel playing such games, I had enough of
> such trouble back in my Windows 98 times.
>
> - this case conversion can actually break things. When iocharset=utf-8
> and utf8 are used, then you cannot access the data with the same
> name after storing it.
>
mount shortname=mixed is probably what you want.
> zombie:/tmp# mkdir test/TEST
> zombie:/tmp# ls test
> test
> zombie:/tmp# ls test/test
> zombie:/tmp# ls test/TEST
> ls: test/TEST: No such file or directory
{pts/1}% sudo mount -t vfat -o
loop,shortname=mixed,utf8,uid=bor /var/tmp/test.dos /tmp/x
{pts/1}% LC_ALL=C ll /tmp/x/test
total 2
drwxr-xr-x 2 bor root 2048 Jul 13 20:06 TEST/
{pts/1}% LC_ALL=C ll /tmp/x/test/TEST
total 0
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
-rwxr-xr-x 1 bor root 0 Jul 13 20:06 ?*
The filesystem is foobared already because it contains both capital and
lowercase versions of the same names. This is what happens when you try to
use wrong codepage.
-andrey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/