[PATCH] mm/hugetlb: fix potential race with try_memory_failure_hugetlb()

From: Miaohe Lin
Date: Wed Jul 10 2024 - 04:19:24 EST


There is a potential race between __update_and_free_hugetlb_folio() and
try_memory_failure_hugetlb():

CPU1 CPU2
__update_and_free_hugetlb_folio try_memory_failure_hugetlb
spin_lock_irq(&hugetlb_lock);
__get_huge_page_for_hwpoison
folio_test_hugetlb
-- It's still hugetlb folio.
folio_test_hugetlb_raw_hwp_unreliable
-- raw_hwp_unreliable flag is not set yet.
folio_set_hugetlb_hwpoison
-- raw_hwp_unreliable flag might
be set.
spin_unlock_irq(&hugetlb_lock);
spin_lock_irq(&hugetlb_lock);
__folio_clear_hugetlb(folio);
-- Hugetlb flag is cleared but too late!
spin_unlock_irq(&hugetlb_lock);

When above race occurs, raw error pages will hit pcplists/buddy. Fix
this issue by deferring folio_test_hugetlb_raw_hwp_unreliable() until
__folio_clear_hugetlb() is done. The raw_hwp_unreliable flag cannot be
set after hugetlb folio flag is cleared.

Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
---
mm/hugetlb.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9155144a654c..3d65b68cf78f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1705,13 +1705,6 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;

- /*
- * If we don't know which subpages are hwpoisoned, we can't free
- * the hugepage, so it's leaked intentionally.
- */
- if (folio_test_hugetlb_raw_hwp_unreliable(folio))
- return;
-
/*
* If folio is not vmemmap optimized (!clear_flag), then the folio
* is no longer identified as a hugetlb page. hugetlb_vmemmap_restore_folio
@@ -1739,6 +1732,13 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
spin_unlock_irq(&hugetlb_lock);
}

+ /*
+ * If we don't know which subpages are hwpoisoned, we can't free
+ * the hugepage, so it's leaked intentionally.
+ */
+ if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+ return;
+
/*
* Move PageHWPoison flag from head page to the raw error pages,
* which makes any healthy subpages reusable.
--
2.33.0