[PATCH v4 3/7] mm,hwpoison: Try to narrow window race for free pages

From: Oscar Salvador
Date: Thu Sep 17 2020 - 04:27:51 EST


Aristeu Rozanski reported that a customer test case started
to report -EBUSY after the hwpoison report patchset.

There is a race window between spotting a free page and taking it off
its buddy freelist, so it might be that by the time we try to take it off,
the page has been already allocated.

This patch tries to handle such race window by trying to handle the new
type of page again if the page was allocated under us.

After this patch, Aristeu said the test cases work properly.

Signed-off-by: Oscar Salvador <osalvador@xxxxxxx>
Reported-by: Aristeu Rozanski <aris@xxxxxxxxx>
---
mm/memory-failure.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index db61bdee9734..a2ccd3ba4015 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1917,6 +1917,7 @@ int soft_offline_page(unsigned long pfn, int flags)
{
int ret;
struct page *page;
+ bool try_again = true;

if (!pfn_valid(pfn))
return -ENXIO;
@@ -1932,6 +1933,7 @@ int soft_offline_page(unsigned long pfn, int flags)
return 0;
}

+retry:
get_online_mems();
ret = get_any_page(page, pfn, flags);
put_online_mems();
@@ -1939,7 +1941,10 @@ int soft_offline_page(unsigned long pfn, int flags)
if (ret > 0)
ret = soft_offline_in_use_page(page);
else if (ret == 0)
- ret = soft_offline_free_page(page);
+ if (soft_offline_free_page(page) && try_again) {
+ try_again = false;
+ goto retry;
+ }

return ret;
}
--
2.26.2