Re: [PATCH v4 0/3] Only free healthy pages in high-order has_hwpoisoned folio

From: Lorenzo Stoakes

Date: Tue May 26 2026 - 12:38:08 EST


On Mon, Feb 23, 2026 at 03:17:57PM -0800, Jiaqi Yan wrote:
> Hi Vlastimil,
>
> Could you and other page_alloc.c reviewers share your thoughts on this
> patchset? Thanks!

Hi Jiaqi,

I guess you are blocked on review here, did you intend to return to this?

I'd suggest attempted a respin to get some movement here as clearly this slipped
through the gaps last cycle.

Cheers, Lorenzo

>
> On Mon, Feb 2, 2026 at 11:41 AM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
> >
> > At the end of dissolve_free_hugetlb_folio() that a free HugeTLB
> > folio becomes non-HugeTLB, it is released to buddy allocator
> > as a high-order folio, e.g. a folio that contains 262144 pages
> > if the folio was a 1G HugeTLB hugepage.
> >
> > This is problematic if the HugeTLB hugepage contained HWPoison
> > subpages. In that case, since buddy allocator does not check
> > HWPoison for non-zero-order folio, the raw HWPoison page can
> > be given out with its buddy page and be re-used by either
> > kernel or userspace.
> >
> > Memory failure recovery (MFR) in kernel does attempt to take
> > raw HWPoison page off buddy allocator after
> > dissolve_free_hugetlb_folio(). However, there is always a time
> > window between dissolve_free_hugetlb_folio() frees a HWPoison
> > high-order folio to buddy allocator and MFR takes HWPoison
> > raw page off buddy allocator.
> >
> > Another similar situation is when a transparent huge page (THP)
> > is handled by MFR but splitting failed. Such THP will eventually
> > be released to buddy allocator when owning userspace processes
> > are gone, but with certain subpages having HWPoison [9].
> >
> > One obvious way to avoid both problems is to add page sanity
> > checks in page allocate or free path. However, it is against
> > the past efforts to reduce sanity check overhead [1,2,3].
> >
> > Introduce free_has_hwpoisoned() to only free the healthy pages
> > and excludes the HWPoison ones in the high-order folio.
> > free_has_hwpoisoned() happens at the end of free_pages_prepare(),
> > which already deals with both decomposing the original compound
> > page, updating page metadata like alloc tag and page owner.
> > It is also only applied when PG_has_hwpoisoned indicates folio
> > contains certain HWPoison page(s) for performance reason.
> > Its idea is to iterate through the sub-pages of the folio to
> > identify contiguous ranges of healthy pages. Instead of freeing
> > pages one by one, decompose healthy ranges into the largest
> > possible blocks. Each block is freed via free_one_page() directly.
> >
> > free_has_hwpoisoned() has linear time complexity wrt the number
> > of pages in the folio. While the power-of-two decomposition
> > ensures that the number of calls to the buddy allocator is
> > logarithmic for each contiguous healthy range, the mandatory
> > linear scan of pages to identify PageHWPoison defines the
> > overall time complexity.
> >
> > I tested with some test-only code [4] and hugetlb-mfr [5], by
> > checking the status of pcplist and freelist immediately after
> > dissolve_free_hugetlb_folio() a free 2M or 1G hugetlb page that
> > contains 1~8 HWPoison raw pages:
> >
> > - HWPoison pages are excluded by free_has_hwpoisoned().
> >
> > - Some healthy pages can be in zone->per_cpu_pageset (pcplist)
> > because pcp_count is not high enough. Many healthy pages are
> > in some order's zone->free_area[order].free_list (freelist).
> >
> > - In rare cases, some healthy pages are in neither pcplist
> > nor freelist. My best guest is they are allocated before
> > the test checks.
> >
> > To illustrate the latency free_has_hwpoisoned() added to the
> > memory freeing path, I tested its time cost with 8 HWPoison
> > pages with instrument code in [4] for 20 sample runs:
> >
> > - Has HWPoison path: mean=1448us, stdev=174ms
> >
> > - No HWPoison path: mean=66us, stdev=6us
> >
> > free_has_hwpoisoned() is around 22x the baseline. It is far from
> > triggering soft lockup, and the cost is fair for handling
> > exceptional hardware memory errors.
> >
> > With free_has_hwpoisoned() ensuring HWPoison pages never made into
> > buddy allocator, MFR don't need to take_page_off_buddy() anymore
> > after disovling HWPoison hugepages. So replace __page_handle_poison()
> > with new __hugepage_handle_poison() for HugeTLB specific call sites.
> >
> > Based on commit 8dfce8991b95d ("Merge tag 'pinctrl-v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl")
> >
> > Changelog
> >
> > v3 [8] -> v4
> >
> > - Address comments from Zi Yan, Miaohe Lin, Harry Yoo.
> >
> > - Set has_hwpoisoned flag after introducing free_has_hwpoisoned().
> >
> > - Unwrap free_pages_prepare_has_hwpoisoned() into free_pages_prepare().
> >
> > - If folio has HWPoison, its healthy pages will be freed with FPI_NONE
> > right in free_pages_prepare(), who returns false to indicate caller
> > should not proceeding its own freeing action.
> >
> > - Rework the commit on __page_handle_poison(). Only change the handling
> > for HWPoison HugeTLB page, leaving free buddy page and soft offline
> > handling alone.
> >
> > v2 [7] -> v3:
> >
> > - Address comments from Mathew Wilcox, Harry Hoo, Miaohe Lin.
> >
> > - Let free_has_hwpoisoned() happen after free_pages_prepare(),
> > which help to deal with decomposing the original compound page,
> > and with page metadata like alloc tag and page owner.
> >
> > - Tested with "page_owner=on" and CONFIG_MEM_ALLOC_PROFILING*=y.
> >
> > - Wrap checking PG_has_hwpoisoned and free_has_hwpoisoned() into
> > free_pages_prepare_has_hwpoisoned(), which replaces
> > free_pages_prepare() calls in free_frozen_pages().
> >
> > - Rename free_has_hwpoison_page() to free_has_hwpoisoned().
> >
> > - Measure latency added by free_has_hwpoisoned().
> >
> > - Ensure struct page *end is only used for pointer arithmetic,
> > instead of accessed as page.
> >
> > - Refactor page_handl_poison instead of just __page_handle_poison().
> >
> > v1 [6] -> v2:
> >
> > - Total reimplementation based on discussions with Mathew Wilcox,
> > Harry Hoo, Zi Yan etc
> >
> > - hugetlb_free_hwpoison_folio() => free_has_hwpoison_pages().
> >
> > - Utilize has_hwpoisoned flag to tell buddy allocator a high-order
> > folio contains HWPoison.
> >
> > - Simplify __page_handle_poison() given that the HWPoison page(s)
> > won't be freed within high-order folio.
> >
> > [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@xxxxxxxxxxxxxxxxxxx
> > [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@xxxxxxxxxxxxxxxxxxx
> > [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@xxxxxxx
> > [4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing
> > [5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@xxxxxxxxxx
> > [6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@xxxxxxxxxx
> > [7] https://lore.kernel.org/linux-mm/20251219183346.3627510-1-jiaqiyan@xxxxxxxxxx
> > [8] https://lore.kernel.org/linux-mm/20260112004923.888429-1-jiaqiyan@xxxxxxxxxx
> > [9] https://lore.kernel.org/linux-mm/20260113205441.506897-1-boudewijn@xxxxxxxxxxxxxx
> >
> > Jiaqi Yan (3):
> > mm/page_alloc: only free healthy pages in high-order has_hwpoisoned
> > folio
> > mm/memory-failure: set has_hwpoisoned flags on dissolved HugeTLB folio
> > mm/memory-failure: skip take_page_off_buddy after dissolving HWPoison
> > HugeTLB page
> >
> > include/linux/page-flags.h | 2 +-
> > mm/memory-failure.c | 37 +++++++++--
> > mm/page_alloc.c | 133 ++++++++++++++++++++++++++++++++++++-
> > 3 files changed, 163 insertions(+), 9 deletions(-)
> >
> > --
> > 2.53.0.rc2.204.g2597b5adb4-goog
> >
>
>