Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops
From: Michael S. Tsirkin
Date: Mon Jun 29 2026 - 16:44:34 EST
On Mon, Jun 29, 2026 at 07:04:25PM +0200, David Hildenbrand (Arm) wrote:
> On 6/29/26 18:54, Andi Kleen wrote:
> >> However, this was a basic test, when allocating 4k pages. With 2M hugepages:
> >>
> >> insns/iter cycles/iter
> >> -------------------------------------------------------
> >> base 20758 +/- 12.5 191208 +/-1946.6
> >> rcu 20785 +/- 3.7 197108 +/- 132.1
> >> atomic 20727 +/- 6.4 204591 +/- 160.2
> >>
> >> rcu vs base +27 (+0.13%) +5900 (+3.09%)
> >> atomic vs base -31 (-0.15%) +13383 (+7.00%)
> >>
> >> and even with THP:
> >>
> >> insns/iter cycles/iter
> >> -------------------------------------------------------
> >> base 27220 +/- 2.8 192151 +/- 483.3
> >> rcu 27248 +/- 30.1 194159 +/-2746.6
> >> atomic 27186 +/- 3.2 200526 +/- 746.2
> >>
> >> rcu vs base +28 (+0.10%) +2008 (+1.04%)
> >> atomic vs base -34 (-0.12%) +8374 (+4.36%)
> >>
> >>
> >> needs more thought.
> >
> > Well the alternative is to not bother with RCU, but just wait a bit and
> > check if the bit stuck and repeat if needed. While that could in theory
> > livelock it is extremely unlikely (especially if you add a bit of randomization
> > to the sleep)
>
> We discussed that a bit already. Hypervisors make it fairly unpredictable how
> long you would actually have to spin.
Way I see it, this is not the issue. The issue is it does not fix the
race:
CPU1:
read flags
CPU2:
test and set
test and set #2 - sees it is set
CPU1:
write flags clearing the bit
> --
> Cheers,
>
> David