Re: [RFC] readdir mess

From: Al Viro
Date: Tue Aug 12 2008 - 20:04:48 EST


On Tue, Aug 12, 2008 at 02:04:25PM -0700, Linus Torvalds wrote:

> But the actual _change_ I talked about would be to try to avoid using
> "buf.error" entirely, and just make all the (many, I know) filesystem
> readdir() functions return the callback value instead of 0. Which would
> mean that most of the users would be able to drop their "buf.error" use
> entirely, along with the ugly
>
> if (error >= 0)
> error = buf.error;
>
> that I have in this patch.

Careful; e.g. coda returns the number of entries read, _not_ 0. So it'd
take more than just "return what the callback had given us".

FWIW, I'm more concerned about the other callers of vfs_readdir();
getdents(2) will get tested on pretty much any fs, so if that breaks
we'll know it (OK, short of ->readdir() doing something spectaculary
stupid when given unusual arguments).

_IF_ we are changing things in ->readdir() (and your "pass return value of
filldir to caller of ->readdir()" certainly qualifies), we'd better make
sure that old instances break visibly and immediately; preferably - at
compile time.

As for whether we want to bother... I've looked through about half of the
->readdir instances. We _do_ want to touch them, with cattle prod if nothing
else. And that's just a cursory look...

9p: touching belief that f_pos can't be changed under us.
adfs: ditto.
affs: user-triggerable affs_warning() (just lseek to 0x10001 and you've
got it), plus fun race: default_lseek() sets ->f_pos before clearing
->f_version and we have no exclusion whatsoever (and then there is an
SMP ordering race). Besides, it returns the number of entries read.
autofs4:broken lseek (uses dcache_readdir() without the matching ->llseek)
befs: pulls out early (after one call of filldir, no matter
what), advances ->f_pos no matter what filldir returns.
bfs: fs-wide exclusion between ->readdir(), -EBADF(!) on invalid offset,
believes that ->f_pos can't change under it.
cifs: f_pos races
coda: returns fsck knows what (number of entries, mostly)
cramfs: blind assumption that on-disk data would be valid; entry crossing a
block boundary => fucked.
ecryptfs: badly broken
efs: f_pos races (BKL doesn't help if you block), ignores the return value
filldir completely, fucked on corrupted data (it's a bit too late
to check that we have bogus offset of name in block after we'd already
accessed the contents). Alignment issues, while we are at it (or
EFS_SLOTAT() is buggy).
ext3: take a look at comments around filldir call. Yes, they are _that_
ancient, and so's the logics around revalidate. ext2 is sane, but
that hadn't propagated. Refuses to go through more than one block,
BTW. Revalidation loop is buggered if we have corrupt data, while
we are at it.
ext4: ditto
freevxfs: AFAICS simply bogus (grep for nblocks there).
hfs: at least missing checks for hfs_bnode_read() failure. And I'm not
at all sure that hfs_mac2asc() use is safe there. BTW, open_dir_list
handling appears to be odd - how the hell does decrementing ->f_pos
help anything? And hfs_dir_release() removes from list without any
locks, so that's almost certainly racy as well.
hfsplus: ditto
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/