Re: [patch 2] mm: speculative get_page

From: Nick Piggin
Date: Mon Jun 27 2005 - 19:04:50 EST


William Lee Irwin III wrote:
On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:

+static inline struct page *page_cache_get_speculative(struct page **pagep)
+{
+ struct page *page;
+
+ preempt_disable();
+ page = *pagep;
+ if (!page)
+ goto out_failed;
+
+ if (unlikely(get_page_testone(page))) {
+ /* Picked up a freed page */
+ __put_page(page);
+ goto out_failed;
+ }


So you pick up 0->1 refcount transitions.


Yep ie. a page that's freed or being freed.


On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:

+ /*
+ * preempt can really be enabled here (only needs to be disabled
+ * because page allocation can spin on the elevated refcount, but
+ * we don't want to hold a reference on an unrelated page for too
+ * long, so keep preempt off until we know we have the right page
+ */
+
+ if (unlikely(PageFreeing(page)) ||


SetPageFreeing is only done in shrink_list(), so other pages in the
buddy bitmaps and/or pagecache pages freed by other methods may not

It is also done by remove_exclusive_swap_page, although that hunk
leaked into a later patch (#5), sorry.

Other methods (eg truncate) don't seem to have an atomicity guarantee
anyway - ie. it is valid to pick up a reference on a page that is
just about to get truncated. PageFreeing is only used when some code
is making an assumption about the number of users of the page.

be found by this. There's also likely trouble with higher-order pages.


There isn't because higher order pages aren't used for pagecache.


On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:

+ unlikely(page != *pagep)) {
+ /* Picked up a page being freed, or one that's been reused */
+ put_page(page);
+ goto out_failed;
+ }
+ preempt_enable();
+
+ return page;
+
+out_failed:
+ preempt_enable();
+ return NULL;
+}


page != *pagep won't be reliably tripped unless the pagecache
modification has the appropriate memory barriers.


There are appropriate memory barriers: the radix tree is
modified uner the rwlock/spinlock, and this function has
a memory barrier before testing page != *pagep.

The lockless radix tree lookups are a harder problem than this, and
the implementation didn't look promising. I have other problems to deal
with so I'm not going to go very far into this.


What's wrong with the lockless radix tree lookups?

While I agree that locklessness is the right direction for the
pagecache to go, this RFC seems to have too far to go to use it to
conclude anything about the subject.


You don't seem to have looked enough to conclude anything about it.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/