[PATCH] mm, madvise: Ensure poisoned pages are removed from per-cpu lists

From: Mel Gorman
Date: Mon Aug 28 2017 - 09:34:20 EST


Wendy Wang reported off-list that a RAS HWPOISON-SOFT test case failed and
bisected it to the commit 479f854a207c ("mm, page_alloc: defer debugging
checks of pages allocated from the PCP"). The problem is that a page that
was poisoned with madvise() is reused. The commit removed a check that
would trigger if DEBUG_VM was enabled but re-enabling the check only
fixes the problem as a side-effect by printing a bad_page warning and
recovering.

The root of the problem is that a madvise() can leave a poisoned on
the per-cpu list. This patch drains all per-cpu lists after pages are
poisoned so that they will not be reused. Wendy reports that the test case
in question passes with this patch applied. While this could be done in
a targeted fashion, it is over-complicated for such a rare operation.

Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Reported-and-tested-by: Wang, Wendy <wendy.wang@xxxxxxxxx>
Cc: stable@xxxxxxxxxx
Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
---
mm/madvise.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 23ed525bc2bc..4d7d1e5ddba9 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -613,6 +613,7 @@ static int madvise_inject_error(int behavior,
unsigned long start, unsigned long end)
{
struct page *page;
+ struct zone *zone;

if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -646,6 +647,11 @@ static int madvise_inject_error(int behavior,
if (ret)
return ret;
}
+
+ /* Ensure that all poisoned pages are removed from per-cpu lists */
+ for_each_populated_zone(zone)
+ drain_all_pages(zone);
+
return 0;
}
#endif