Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
From: Michael S. Tsirkin
Date: Thu Jun 11 2026 - 02:36:36 EST
On Tue, Jun 09, 2026 at 08:38:09PM +0200, David Hildenbrand (Arm) wrote:
> On 6/9/26 20:10, Andrew Morton wrote:
> > On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> >
> >> TestSetPageHWPoison() is called without zone->lock, so its atomic
> >> update to page->flags can race with non-atomic flag operations
> >> that run under zone->lock in the buddy allocator.
> >>
> >> In particular, __free_pages_prepare() does:
> >>
> >> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
> >>
> >> This non-atomic read-modify-write, while correctly excluding
> >> __PG_HWPOISON from the mask, can still lose a concurrent
> >> TestSetPageHWPoison if the read happens before the poison bit
> >> is set and the write happens after. Will only get worse if/when
> >> we add more non-atomic flag operations.
> >>
> >> Fix by acquiring zone->lock around TestSetPageHWPoison and
> >> around ClearPageHWPoison in the retry path. This
> >> serializes with all buddy flag manipulation. The cost is
> >> negligible: one lock/unlock in an extremely rare path
> >> (hardware memory errors).
> >>
> >> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
> >> in this file operate on pages already removed from the buddy
> >> allocator or on non-buddy pages (DAX, hugetlb), so they do not
> >> need zone->lock protection.
> >
> > Sashiko is saying this doesn't do anything "Because
> > __free_pages_prepare() executes entirely locklessly". Did it goof?
> >
> > https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@xxxxxxxxxx
>
> Battle of the bots: it's right.
Ugh it's bot against human - I remembered we have zone lock
normally in alloc and thought it helps and didn't double check we don't
have it here . The bot wins (
> --
> Cheers,
>
> Davidc