Re: [PATCH] Sanitize filesystem NLS handling

From: Alexander E. Patrakov
Date: Mon Mar 19 2007 - 11:04:16 EST


OGAWA Hirofumi wrote:

No, the utf8 support of vfat is wrong. This is implementation
thing, and it is not recommended until it is fixed.

Could you please explain your specific problem with screenshots, preferrably by running my LiveCD or Debian Etch in an emulator such as qemu or VMware?

But, anyway, this is a separate issue that my patch doesn't attempt to correct. The conclusion so far is that we disagree, and that there are situations where using utf8 iocharset is the least of all evils, so the warning is not justified enough. Reproducible testcase:

1) Install MS Windows 2000 or XP (English aka Pan-European version is just fine)
2) When the installer asks about locale settings, choose that you (a) want to have Cyrillic group of languages, (b) that your locale/location is Russia, (c) use Russian as a language for non-Unicode programs, and (d) add Russian keyboard layout in addition to the existing English layout if the system doesn't do this for you.
3) Using Windows, create a file on a floppy or flash drive. Press F2 to rename it. Press Alt+Shift to switch to Russian keyboard layout. Type a new file name. Remember (or copy to paper) what the letters that make the file name look like in Windows.
4) Boot my LiveCD and choose "Russian (UTF-8)" from the locale dialog (this matches the setup found in all modern distros such as Fedora). On the second screen, confirm the default settings. (Alternatively, install Debian Etch and select Russian during installation).
5) Try to mount the floppy or flash drive to /mnt using various iocharsets. List the directory contents with "ls -l". You will find that utf8 is the only iocharset that gives the correct letters.

OTOH, if you select Russian (KOI8-R) or Russian (CP1251), you will find that the only iocharset giving the correct letters will be koi8-r or cp1251, respectively.

I'm talking about two filesystems on a system here, not two encoding
on one filesystem.

I am also talking about this. Mounting two filesystems with different iocharsets is insane, because this will result in one of the following outcomes:

1) "ls" will show wrong characters in filenames on one of the filesystems
2) one of the two filesystems will contain wrong on-disk data for filenames, that, when misinterpreted by mounting with wrong iocharset, results in seemingly-correct output, but is misunderstood by the properly set up reference implementation (that's what is likely to happen with jfs in your example).

Testcase for (2), to be performed with my LiveCD:

1) Boot the CD, select "Russian (KOI8-R)" from the locale dialog. Agree with the default settings on the second screen.
2) Mount an empty floppy or flash drive with the (wrong) "cp1251" iocharset.
3) Using the right Ctrl key to switch between between English and Russian keyboard layouts, create a file with Russian letters in its name under /mnt.
4) umount /mnt
5) Install Windows 2000 or XP with Russian-specific settings, as described above.
6) Look what Windows finds on the floppy or flash. It is not the same filename you typed, but a sort of garbage (mojibake).

--
Alexander E. Patrakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/