Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops

From: David Hildenbrand (Arm)

Date: Mon Jun 29 2026 - 04:22:07 EST


On 6/29/26 10:10, Michael S. Tsirkin wrote:
> On Sun, Jun 28, 2026 at 07:11:58PM -0700, Andi Kleen wrote:
>> On Sun, Jun 28, 2026 at 05:45:22PM -0400, Michael S. Tsirkin wrote:
>>> This series fixes the race by:
>>>
>>> 1. Having memory_failure() call synchronize_rcu() + retry after
>>> setting HWPoison, so that any in-flight non-atomic RMW that
>>> read the old flags value completes before we proceed.
>>>
>>> 2. Wrapping all non-atomic page flag operations in
>>> rcu_read_lock/rcu_read_unlock (CONFIG_MEMORY_FAILURE only),
>>> so that synchronize_rcu() actually drains them.
>>
>> It wouldn't surprise me if your underlying performance assumptions
>> -- an non contended atomic is cheaper than a rcu_read_lock/unlock --
>> are not true in various CPU/kernel configuration combinations.
>>
>> Modern CPUs have fast atomics when uncontended in normal circumstances.
>> But it probably doesn't matter much either way because the difference
>> shouldn't be very much.
>
>
> Hmm. It's a bit silly that I didn't try. Seemed clear to me, but,
> on this old xeon...
>
> insns/iter cycles/iter
> -------------------------------------------------------
> base 12238 +/- 1.0 17889 +/- 97.9
> rcu_read_lock 12251 +/- 7.3 17991 +/-191.6
> atomic ops 12233 +/- 1.9 17733 +/-136.5
>
>
> The diff in the noise.
>
> And old, slow CPUs maybe don't have MF at all.
>
> So maybe just atomics instead of all this mess.

That would be much better.

What I was concerned about so far was that many distributions enable hwpoison
handling unconditionally (independent of any specific CPU!).

I recall running experiments on some not-so-dated hardware 2 years ago (when
optimizing out rmap atomics) where additional atomics really hurt, even in
uncontended cases.

>
>
>
>
>> It seems very complicated for something that
>> could be much simpler.
>>
>> But I guess it's fine.
>>
>> -Andi
>
> Indeed. David already said he's gonnu look at this himself, but he
If we can go that simple route (I'm not sure yet), your patch would be fine. I
can try finding someone to run more experiments on arm64 hardware.

--
Cheers,

David