Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock

From: Michael S. Tsirkin

Date: Tue Jun 09 2026 - 16:25:12 EST


On Tue, Jun 09, 2026 at 11:10:20AM -0700, Andrew Morton wrote:
> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>
> > TestSetPageHWPoison() is called without zone->lock, so its atomic
> > update to page->flags can race with non-atomic flag operations
> > that run under zone->lock in the buddy allocator.
> >
> > In particular, __free_pages_prepare() does:
> >
> > page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
> >
> > This non-atomic read-modify-write, while correctly excluding
> > __PG_HWPOISON from the mask, can still lose a concurrent
> > TestSetPageHWPoison if the read happens before the poison bit
> > is set and the write happens after. Will only get worse if/when
> > we add more non-atomic flag operations.
> >
> > Fix by acquiring zone->lock around TestSetPageHWPoison and
> > around ClearPageHWPoison in the retry path. This
> > serializes with all buddy flag manipulation. The cost is
> > negligible: one lock/unlock in an extremely rare path
> > (hardware memory errors).
> >
> > Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
> > in this file operate on pages already removed from the buddy
> > allocator or on non-buddy pages (DAX, hugetlb), so they do not
> > need zone->lock protection.
>
> Sashiko is saying this doesn't do anything "Because
> __free_pages_prepare() executes entirely locklessly". Did it goof?
>
> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@xxxxxxxxxx
>

Oh. So it only helps with the prezero patches. Maybe other places where
flags are touched locklessly. Not __free_pages_prepare. I was too
focused on that. Scrap this please. I'll try to think of something.


--
MST