On Sat, Jan 11, 2003 at 11:00:38PM +0000, Jeff Garzik wrote:
> On Sat, Jan 11, 2003 at 02:43:08PM -0800, Andrew Morton wrote:
> > - dcache-RCU.
> >
> > This was recently updated to fix a rename race. It's quite stable. I'm
> > not sure where we stand wrt merging it now. Al seems to have disappeared.
>
> I talked to him in person last week, and this was one of the topics of
> discussion. He seemed to think it was fundamentally unfixable. He
> proceed to explain why, and then explained the scheme he worked out to
> improve things. Unfortunately my memory cannot do justice to the
> details.
The rename race is fixed now. Yes, it was unfixable using *existing* RCU
techniques, but one has to invent new tricks when the old bag of
tricks is empty :)
Fundamentally what happens is that rename may be *two* updates - delete
from one hash chain and insert into another hash chain. In order for
lockfree traversal to work correctly, you must have a grace period after
each update. If we do a grace period between these two updates in a rename,
it slows down renames to unacceptable levels. So we had a problem there.
The solution lies in the dcache itself - it has a fast path (cached_lookup)
and a slow path (real_lookup). So all we had to do was to detect that a
rename had happened to the dentry while we looked it up lockfree. This
is done by a generation counter (d_move_count) in the dentry and is
protected by the per-dentry spinlock which we take during rename and
a successful cache lookup.
Two things can happen due to the rename race - lookup incorrectly succeeds
or lookup incorrectly fails. The success case is easily handled by
the lockfree lookup code that looks like this -
for the dentries in the hash chain {
... More stuff....
move_count = dentry->d_move_count;
if (dentry name matches) {
/* lookup succeeds */
spin_lock(&dentry->d_lock);
if (move_count != dentry->d_move_count) {
/*
* A rename happened while looking up lockfree and
* we now cannot gurantee
* that the lookup is correct
*/
spin_unlock(&dentry->d_lock);
return slow_lookup();
}
....
....
}
... More stuff....
}
If the lookup fails due to rename race, then there will anyway be a
slow real_lookup which is serialized with rename.
Maneesh did a lot of testing using many ramfs and many millions of renames
with millions of lookups going on at the same time and slow path was hit only
100 times or so. For practical workloads, this should have absolutely no
performance impact.
Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Jan 15 2003 - 22:00:44 EST