Re: Long file names in VFAT broken with iocharset=utf8

From: Roland Kuhn
Date: Mon May 07 2007 - 14:07:48 EST


Hi Andrey!

On 7 May 2007, at 19:51, Andrey Borzenkov wrote:

This was posted in one of Russian forums. It was not possible to archive
(under Linux, using tar) vfat directory where files had long Russian names
(really long - over 150 - 170 characters) - tar returned stat failure. When
looking with plain ls, file names appeared truncated.

Now looking at current (2.6.21) fat driver, __fat_readdir allocates large
enough buffer (PAGE_SIZE-522) for UTF-8 name; but for iocharset=utf8 it calls
uni16_to_x8() which artificially limits length of UTF-8 name to 256 ... which
is obviously not enough for long UTF-8 Russian string (2 bytes per character)
not to mention the - theoretical - general case of 6 bytes UTF-8 characters.

Similar problem has apparently vfat_lookup()->...->fat_search_long () call
chain. Except this appears to be broken even in case of "utf8", because
fat_search_long allocates fixed 256 bytes buffer for UTF-8 name.

Am I off track here?

PATH_MAX specifically counts _bytes_ not characters, so UTF-8 does not matter. ISTR that PATH_MAX was 256 at some point, but I just quickly grepped /usr/include and found various mention of 4096, so where's the central repository for this configuration item? A hard- coded value of 256 somewhere inside the kernel smells like a bug.

Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both. - Benjamin Franklin
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/CS/M/MU d-(++) s:+ a-> C+++ UL++++ P+++ L+++ E(+) W+ !N K- w--- M + !V Y+
PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++>++++ h---- y+++
------END GEEK CODE BLOCK------


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Attachment: PGP.sig
Description: This is a digitally signed message part