Re: [patch 5/7] radix-tree: lockless readside
From: Nick Piggin
Date: Thu Aug 11 2005 - 23:40:34 EST
On Thu, 2005-08-11 at 18:37 -0700, Paul E. McKenney wrote:
> On Thu, Aug 11, 2005 at 10:25:47PM +1000, Nick Piggin wrote:
> > 5/7
> >
> > --
> > SUSE Labs, Novell Inc.
> >
>
> > Make radix tree lookups safe to be performed without locks.
> > Readers are protected against nodes being deleted by using RCU
> > based freeing. Readers are protected against new node insertion
> > by using memory barriers to ensure the node itself will be
> > properly written before it is visible in the radix tree.
> >
> > Also introduce a lockfree gang_lookup_slot which will be used
> > by a future patch.
>
> Interesting approach! Don't claim to fully understand it, but
> see below (search for empty lines). In the meantime, some questions:
>
Basically, the main idea is this:
find_get_page (the current, locked version) will take the tree_lock,
elevate the refcount of the page currently in pagecache, and drop
the tree_lock.
tree_lock is used to a) ensure consistency of the radix tree, and
b) "pin" the page until we are able to take a reference to it.
After find_get_page returns the page with elevated refcount, it
has released the tree_lock, so there is no other atomicity /
serialisation provided other than the above 2 functions.
* Assuming my RCU'ed radix-tree is correct, that takes care of a.
* Now, we look up a 'struct page' that has existed in the pagecache
at *some* point in time (ie. dereference the pagep pointer).
* Then, take a reference on that struct page, regardless of whether
or not it is *now* in pagecache. This act of taking a reference
is also a memory barrier.
* Now we can again check if the page existed in pagecache _after_
taking that reference. If yes, we have a reference to the page
we want.
* If it is no longer in pagecache, we assume it is the wrong page.
Even if there is a new page in that part of the pagecache we can
return NULL, because the pagecache would have had to be NULL at
*some* point between the 2 times we load the pagep pointer.
With the above, we can meet the same requirements of the current
find_get_page. Which basically are:
x) If the page was ever[1] in pagecache, it may be returned
y) If the pagecache was ever[2] empty, NULL may be returned
[1], [2] - in accordance with the high level serialisation
schema, of course. The main point is that the tree_lock in
find_get_page can be taken at any arbitrary time and thus
doesn't provide anything past x and y.
Similar arguments for find_lock_page and find_get_pages, etc.
Anyway, that's the high level idea. All the rest of the machinery
is geared to handle the case where we have taken a reference to
the page after it has been removed from pagecache or used for
something else. This is admittedly fairly complex, but don't get
too hung up about it when getting an overview of it is supposed
to work.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/