Re: [PATCH splitout] mm: memory-failure: serialize TestSetPageHWPoison with zone->lock
From: Zi Yan
Date: Tue Jun 09 2026 - 14:40:03 EST
On 9 Jun 2026, at 14:38, David Hildenbrand (Arm) wrote:
> On 6/9/26 20:10, Andrew Morton wrote:
>> On Tue, 9 Jun 2026 06:12:49 -0400 "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
>>
>>> TestSetPageHWPoison() is called without zone->lock, so its atomic
>>> update to page->flags can race with non-atomic flag operations
>>> that run under zone->lock in the buddy allocator.
>>>
>>> In particular, __free_pages_prepare() does:
>>>
>>> page->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>>>
>>> This non-atomic read-modify-write, while correctly excluding
>>> __PG_HWPOISON from the mask, can still lose a concurrent
>>> TestSetPageHWPoison if the read happens before the poison bit
>>> is set and the write happens after. Will only get worse if/when
>>> we add more non-atomic flag operations.
>>>
>>> Fix by acquiring zone->lock around TestSetPageHWPoison and
>>> around ClearPageHWPoison in the retry path. This
>>> serializes with all buddy flag manipulation. The cost is
>>> negligible: one lock/unlock in an extremely rare path
>>> (hardware memory errors).
>>>
>>> Note: SetPageHWPoison and TestClearPageHWPoison calls elsewhere
>>> in this file operate on pages already removed from the buddy
>>> allocator or on non-buddy pages (DAX, hugetlb), so they do not
>>> need zone->lock protection.
>>
>> Sashiko is saying this doesn't do anything "Because
>> __free_pages_prepare() executes entirely locklessly". Did it goof?
>>
>> https://sashiko.dev/#/patchset/df06b66fe4ff8e925ee0714955abc2183a727b90.1780998980.git.mst@xxxxxxxxxx
>
> Battle of the bots: it's right.
Yep, __free_pages_prepare() changes the page flag without holding
zone->lock.
Best Regards,
Yan, Zi