Re: [PATCH v3 1/2] mm/memory-failure: fix missing ->mf_stats count in hugetlb poison

From: jane . chu

Date: Tue Dec 23 2025 - 13:29:09 EST



On 12/23/2025 12:50 AM, David Hildenbrand (Red Hat) wrote:
On 12/23/25 02:21, Jane Chu wrote:
When a newly poisoned subpage ends up in an already poisoned hugetlb
folio, 'num_poisoned_pages' is incremented, but the per node ->mf_stats
is not. Fix the inconsistency by designating action_result() to update
them both.

While at it, define __get_huge_page_for_hwpoison() return values in terms
of symbol names for better readibility. Also rename
folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison() since the
function does more than the conventional bit setting and the fact
three possible return values are expected.

Fixes: 18f41fa616ee4 ("mm: memory-failure: bump memory failure stats to pglist_data")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
---
v2 -> v3:
   No change.
v1 -> v2:
   adapted David and Liam's comment, define __get_huge_page_for_hwpoison()
return values in terms of symbol names instead of naked integers for better
readibility.  #define instead of enum is used since the function has footprint
outside MF, just try to limit the MF specifics local.
   also renamed folio_set_hugetlb_hwpoison() to hugetlb_update_hwpoison()
since the function does more than the conventional bit setting and the
fact three possible return values are expected.

---
  mm/memory-failure.c | 56 ++++++++++++++++++++++++++-------------------
  1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fbc5a01260c8..8b47e8a1b12d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1883,12 +1883,18 @@ static unsigned long __folio_free_raw_hwp(struct folio *folio, bool move_flag)
      return count;
  }
-static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
+#define    MF_HUGETLB_ALREADY_POISONED    3  /* already poisoned */
+#define    MF_HUGETLB_ACC_EXISTING_POISON    4  /* accessed existing poisoned page */

What happened to the idea of using an enum?

I briefly mentioned the reason of using #define instead of enum in the v1 -> v2 comment, more below.



+/*
+ * Set hugetlb folio as hwpoisoned, update folio private raw hwpoison list
+ * to keep track of the poisoned pages.
+ */
+static int hugetlb_update_hwpoison(struct folio *folio, struct page *page)
  {
      struct llist_head *head;
      struct raw_hwp_page *raw_hwp;
      struct raw_hwp_page *p;
-    int ret = folio_test_set_hwpoison(folio) ? -EHWPOISON : 0;
+    int ret = folio_test_set_hwpoison(folio) ? MF_HUGETLB_ALREADY_POISONED : 0;
      /*
       * Once the hwpoison hugepage has lost reliable raw error info,
@@ -1896,20 +1902,18 @@ static int folio_set_hugetlb_hwpoison(struct folio *folio, struct page *page)
       * so skip to add additional raw error info.
       */
      if (folio_test_hugetlb_raw_hwp_unreliable(folio))
-        return -EHWPOISON;
+        return MF_HUGETLB_ALREADY_POISONED;
+
      head = raw_hwp_list_head(folio);
      llist_for_each_entry(p, head->first, node) {
          if (p->page == page)
-            return -EHWPOISON;
+            return MF_HUGETLB_ACC_EXISTING_POISON;
      }
      raw_hwp = kmalloc(sizeof(struct raw_hwp_page), GFP_ATOMIC);
      if (raw_hwp) {
          raw_hwp->page = page;
          llist_add(&raw_hwp->node, head);
-        /* the first error event will be counted in action_result(). */
-        if (ret)
-            num_poisoned_pages_inc(page_to_pfn(page));
      } else {
          /*
           * Failed to save raw error info.  We no longer trace all
@@ -1955,32 +1959,30 @@ void folio_clear_hugetlb_hwpoison(struct folio *folio)
      folio_free_raw_hwp(folio, true);
  }
+#define    MF_HUGETLB_FREED            0    /* freed hugepage */
+#define    MF_HUGETLB_IN_USED            1    /* in-use hugepage */
+#define    MF_NOT_HUGETLB                2    /* not a hugepage */

If you're already dealing with negative error codes, "MF_NOT_HUGETLB" nicely
translated to -EINVAL.

Agreed, thanks, will make the change in next round.


But I wonder if it would be cleaner to just define all values in an enum and return
that enum instead of an int from the functions.

enum md_hugetlb_status {
    MF_HUGETLB_INVALID,        /* not a hugetlb folio */
    MF_HUGETLB_BUSY,        /* busy, retry later */
    MF_HUGETLB_FREED,        /* hugetlb folio was freed */
    MF_HUGETLB_IN_USED,        /* ??? no idea what that really means */
    MF_HUGETLB_FOLIO_PRE_POISONED,    /* folio already poisoned, per- page information unclear */
    MF_HUGETLB_PAGE_PRE_POISONED,    /* exact page already poisoned */
}

enum is nicer than #define for sure, the only concern I have is that, this enum as a return type from _get_huge_page_for_hwpoison()/
get_huge_page_for_hwpoison() will leave footprint in
include/linux/mm.h that declares _get_huge_page_for_hwpoison(),
include/linux/hugetlb.h that declares/defines get_huge_page_for_hwpoison(),
and mm/hugetlb.c.

Is there a neat way? or, am I nitpicking? :)

thanks,
-jane