Re: [patch 5/7] radix-tree: lockless readside

From: Nick Piggin
Date: Thu Aug 11 2005 - 23:00:59 EST


On Thu, 2005-08-11 at 18:37 -0700, Paul E. McKenney wrote:
> On Thu, Aug 11, 2005 at 10:25:47PM +1000, Nick Piggin wrote:
> > 5/7
> >
> > --
> > SUSE Labs, Novell Inc.
> >
>
> > Make radix tree lookups safe to be performed without locks.
> > Readers are protected against nodes being deleted by using RCU
> > based freeing. Readers are protected against new node insertion
> > by using memory barriers to ensure the node itself will be
> > properly written before it is visible in the radix tree.
> >
> > Also introduce a lockfree gang_lookup_slot which will be used
> > by a future patch.
>
> Interesting approach! Don't claim to fully understand it, but
> see below (search for empty lines). In the meantime, some questions:
>

Thanks for comments and review - very helpful.

> o What exactly is RCU protecting? My first guess is that
> it protects the pointers and internal nodes of the
> radix tree, but not the objects in the leaves of the
> trees (in other words, the things pointed to by the
> return value from things like radix_tree_lookup_slot()).
>

Ah OK, this wasn't initially apparent to me either, however I
believe it is needed.

Indeed we are protecting the internal nodes of the radix tree,
however the speculative get operation needs to dereference a
*pointer* to the page. The `struct page` itself doesn't go away,
and thus doesn't need any protection. However the *pointer* can
go away - it is contained within a radix tree node.

> But if this really is the case, then the rcu_read_lock() &
> rcu_read_unlock() pairs can be pushed down into
> radix_tree_lookup_slot() and friends.
>
> o The current code structure would lead me to believe that
> page_cache_get_speculative() is protected by RCU, but
> I don't see the corresponding call_rcu() or synchronize_rcu()
> that would cause this protection to be required.
>

I believe this should answer your concern about the RCU protection
placement.

One thing we could do differently is this: instead of receiving
a pointer to the struct page from the radix tree lookup, we could
actually do as you say and push the rcu locking into the radix
tree lookup.

Then we would do a speculative get_page on that struct page,
finally we would lookup the radix tree again to see whether the
same page is still in the pagecache.

I discounted this idea not just for efficiency, but also because
it would require either a 2 phase speculative get (or a speculative
undo), or would require teaching speculative get about the pagecache
radix tree.

> > @@ -208,9 +219,13 @@ static int radix_tree_extend(struct radi
> > tag_set(node, tag, 0);
> > }
> >
> > + newheight = root->height+1;
> > + node->height = newheight;
> > node->count = 1;
> > + /* Make ->height visible before node visible via ->rnode */
>
> > + smp_wmb();
> > root->rnode = node;
>
> The prior two lines should instead be:
>
> rcu_assign_pointer(root->rnode, node);
>

Great, thanks very much.

--
SUSE Labs, Novell Inc.



Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/