Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops

From: Michael S. Tsirkin

Date: Mon Jun 29 2026 - 16:44:34 EST


On Mon, Jun 29, 2026 at 07:04:25PM +0200, David Hildenbrand (Arm) wrote:
> On 6/29/26 18:54, Andi Kleen wrote:
> >> However, this was a basic test, when allocating 4k pages. With 2M hugepages:
> >>
> >> insns/iter cycles/iter
> >> -------------------------------------------------------
> >> base 20758 +/- 12.5 191208 +/-1946.6
> >> rcu 20785 +/- 3.7 197108 +/- 132.1
> >> atomic 20727 +/- 6.4 204591 +/- 160.2
> >>
> >> rcu vs base +27 (+0.13%) +5900 (+3.09%)
> >> atomic vs base -31 (-0.15%) +13383 (+7.00%)
> >>
> >> and even with THP:
> >>
> >> insns/iter cycles/iter
> >> -------------------------------------------------------
> >> base 27220 +/- 2.8 192151 +/- 483.3
> >> rcu 27248 +/- 30.1 194159 +/-2746.6
> >> atomic 27186 +/- 3.2 200526 +/- 746.2
> >>
> >> rcu vs base +28 (+0.10%) +2008 (+1.04%)
> >> atomic vs base -34 (-0.12%) +8374 (+4.36%)
> >>
> >>
> >> needs more thought.
> >
> > Well the alternative is to not bother with RCU, but just wait a bit and
> > check if the bit stuck and repeat if needed. While that could in theory
> > livelock it is extremely unlikely (especially if you add a bit of randomization
> > to the sleep)
>
> We discussed that a bit already. Hypervisors make it fairly unpredictable how
> long you would actually have to spin.


Way I see it, this is not the issue. The issue is it does not fix the
race:


CPU1:

read flags

CPU2:

test and set

test and set #2 - sees it is set

CPU1:

write flags clearing the bit




> --
> Cheers,
>
> David