Re: [PATCH] fs: fat: add check for dir size in fat_calc_dir_size

From: OGAWA Hirofumi
Date: Fri Jul 03 2020 - 15:13:15 EST


Anupam Aggarwal <anupam.al@xxxxxxxxxxx> writes:

>>So what was the root cause of slowness on big directory?
>
> Problem happened on FAT32 formatted 32GB USB 3.0 pendrive, which has
> 20GB of data, cluster size is 16KB It has one corrupted directory
> whose size calculated by fat_calc_dir_size() is 1146896384 bytes
> i.e. 1.06 GB.
>
> When directory traversal of corrupted directory starts, directory
> entries looks to be corrupted and lookup fails for these directory
> entries. Some directory entries name are having format abc/xyz,
> following are the few observed directory entry names:

[...]

> During search for single name in fat_search_long() function, whole
> corrupted directory of size 1.06GB is traversed, which takes around
> 230 to 240 secs, which finally ends up with returning ENOENT.
>
> Now multiple lookups in corrupted directory makes âls -lRâ
> never-ending e.g. in overnite test of running âls âlRâ on USB having
> corrupted directory, around 200 such lookups in corrupted directory
> took 14hrs and still âls âlRâ is running.

Sounds like totally corrupted FAT image, and the directory may have the
non-simple loop (e.g. there is hardlink of directory).

If so, I'm not sure if we can detect without heavyweight check. Well,
although user should run fsck before mount. However, if fs can detect
and stop early, it would be better.

BTW, if you run fsck, the corrupted directories and issue are gone at
least?

Anyway, fsck would be main way. And on other hand, if we want to add
mitigation for corruption, we would have to see much more details of
this corruption. Can you put somewhere to access the corrupted image
(need the only metadata) to reproduce?

> Total number of directory entries in corrupted directory of size
> 1146896384 bytes = 1146896384/32 = 35840512, so lookup for 35840512
> looks very exhaustive, therefore we have put size check of directory
> in fat_calc_dir_size() and prevented the directory traversal by
> returning -EIO.
>
> While browsing corrupted directory(\CorruptedDIR) on Windows 10 PC,
> 2623 directory entries were listed and timestamps were wrong

What happens if you recursively traversed directories on Windows? This
issue happens on Windows too?

Thanks.
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>