Re: [RFC][PATCH] HWPOISON: remove the unsafe __set_page_locked()

From: Wu Fengguang
Date: Tue Sep 29 2009 - 01:16:44 EST


On Mon, Sep 28, 2009 at 04:44:01PM +0800, Wu Fengguang wrote:
> On Mon, Sep 28, 2009 at 03:20:25AM +0800, Nick Piggin wrote:

> > One other thing to keep in mind that I will mention is that I am
> > going to push in a patch to the page allocator to allow callers
> > to avoid the refcounting (atomic_dec_and_test) in page lifetime,
> > which is especially important for SLUB and takes more cycles off
> > the page allocator...
> >
> > I don't know exactly what you're going to do after that to get a
> > stable reference to slab pages. I guess you can read the page
> > flags and speculatively take some slab locks and recheck etc...
>
> For reliably we could skip page lock on zero refcounted pages.
>
> We may lose the PG_hwpoison bit on races with __SetPageSlub*, however
> it should be an acceptable imperfection.

I'd like to propose this fix for 2.6.32, which can do 100% correctness
for the discussed races :)

In brief it is

if (is not lru page)
return and don't touch page lock;

Any comments?

Thanks,
Fengguang

---
HWPOISON: return early on non-LRU pages

This avoids unnecessary races with __set_page_locked() and
__SetPageSlab*() and maybe more non-atomic page flag operations.

Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
mm/memory-failure.c | 54 ++++++++++++++----------------------------
1 file changed, 19 insertions(+), 35 deletions(-)

--- sound-2.6.orig/mm/memory-failure.c 2009-09-29 12:27:36.000000000 +0800
+++ sound-2.6/mm/memory-failure.c 2009-09-29 12:32:52.000000000 +0800
@@ -327,16 +327,6 @@ static const char *action_name[] = {
};

/*
- * Error hit kernel page.
- * Do nothing, try to be lucky and not touch this instead. For a few cases we
- * could be more sophisticated.
- */
-static int me_kernel(struct page *p, unsigned long pfn)
-{
- return DELAYED;
-}
-
-/*
* Already poisoned page.
*/
static int me_ignore(struct page *p, unsigned long pfn)
@@ -370,9 +360,6 @@ static int me_pagecache_clean(struct pag
int ret = FAILED;
struct address_space *mapping;

- if (!isolate_lru_page(p))
- page_cache_release(p);
-
/*
* For anonymous pages we're done the only reference left
* should be the one m_f() holds.
@@ -498,30 +485,18 @@ static int me_pagecache_dirty(struct pag
*/
static int me_swapcache_dirty(struct page *p, unsigned long pfn)
{
- int ret = FAILED;
-
ClearPageDirty(p);
/* Trigger EIO in shmem: */
ClearPageUptodate(p);

- if (!isolate_lru_page(p)) {
- page_cache_release(p);
- ret = DELAYED;
- }
-
- return ret;
+ return DELAYED;
}

static int me_swapcache_clean(struct page *p, unsigned long pfn)
{
- int ret = FAILED;
-
- if (!isolate_lru_page(p)) {
- page_cache_release(p);
- ret = RECOVERED;
- }
delete_from_swap_cache(p);
- return ret;
+
+ return RECOVERED;
}

/*
@@ -576,13 +551,6 @@ static struct page_state {
{ reserved, reserved, "reserved kernel", me_ignore },
{ buddy, buddy, "free kernel", me_free },

- /*
- * Could in theory check if slab page is free or if we can drop
- * currently unused objects without touching them. But just
- * treat it as standard kernel for now.
- */
- { slab, slab, "kernel slab", me_kernel },
-
#ifdef CONFIG_PAGEFLAGS_EXTENDED
{ head, head, "huge", me_huge_page },
{ tail, tail, "huge", me_huge_page },
@@ -775,6 +743,22 @@ int __memory_failure(unsigned long pfn,
}

/*
+ * We ignore non-LRU pages for good reasons.
+ * - PG_locked is only well defined for LRU pages and a few others
+ * - to avoid races with __set_page_locked()
+ * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
+ * The check (unnecessarily) ignores LRU pages being isolated and
+ * walked by the page reclaim code, however that's not a big loss.
+ */
+ if (!PageLRU(p))
+ lru_add_drain_all();
+ if (isolate_lru_page(p)) {
+ action_result(pfn, "non LRU", IGNORED);
+ return -EBUSY;
+ }
+ page_cache_release(p);
+
+ /*
* Lock the page and wait for writeback to finish.
* It's very difficult to mess with pages currently under IO
* and in many cases impossible, so we just avoid it here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/