Re: [PATCH] vfs: fix race in rcu lookup of pruned dentry

From: Al Viro
Date: Sun Jul 17 2011 - 22:08:36 EST


On Sun, Jul 17, 2011 at 06:13:34PM -0700, Hugh Dickins wrote:

> And in the genuine negative dentry case: at present (with or without
> my or Linus's patches) there is no final nd->seq check, is there?

For stat() - no, there isn't. We really bail out with -ENOENT, no matter
what. Racy, which is what you are hitting.

> Having found a negative dentry, we just get out at that point with
> -ENOENT. Now, it's true that there might be a race in which the
> negative dentry is being made positive just as we're getting out;
> but since we're not digging in any deeper, that seems to me just
> a natural inevitable race, nothing to be guarded against.
>
> Whereas your "if (!*inode) goto unlazy;" is adding an additional
> sequence check, isn't it? Is that a race you see as important?
> If so, it is a different matter than I was ever considering.

Umm... I really don't think that trying to avoid that ->d_seq comparison
is worth doing. I understand your point, but... IMO it's safer to leave
->d_inode cleaning as-is, at least for now. Hell knows; I'd be a lot
more optimistic if fs/dcache.c had been in better shape. As it is, *and*
considering that we are within a spitting distance from -final...

It's not as if we'd been talking about a heavy contention or cacheline
bouncing here, let alone doing ->lookup(). I'm not entirely convinced
that it's a valid optimization in the first place (probably is, but I'm
seriously scared by the complexity we already have there), and I'm really
not fond of the idea of dealing with whatever subtle crap we might
discover with Linus' patch. Again, dcache is not in a healthy shape
right now; at this point dumb and straightforward is, IMO, better than
subtle and risking to step on toes of very odd code out there.

Hell, stuff that walks towards root with just rcu_read_lock, checking
->d_inode as it goes and making assumptions about the lifecycle
stages of the dentries it sees is not even all that weird, compared to
the code actually in there ;-/

Once we are done with code audit, sure, I'm fine with ->d_inode being
kept until dentry is actually freed. Any code that relies on that thing
being cleared is asking for trouble and should be rewritten anyway.
The only thing is, it needs to be found before we rewrite it...

Speaking of weird code - could somebody explain what the deuce is going
on in dget_parent()?
rcu_read_lock();
ret = dentry->d_parent;
if (!ret) {
rcu_read_unlock();
goto out;
}
in the very beginning looks wrong - we *never* have NULL ->d_parent on
dentries that might be seen by anyone; d_alloc(NULL, name) callers all
set ->d_parent to dentry itself ASAP and we never put NULL there afterwards.

What the hell is going on there? Note that callers of dget_parent() will
be really unimpressed by getting NULL from it; most of them will cheerfully
dereference what they get. It seems to be Nick's change; is there anybody
who would remember reasons for it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/