Re: [PATCH 0/2] mm: memory-failure: fix HWPoison flag race with non-atomic page flag ops

From: Michael S. Tsirkin

Date: Mon Jun 29 2026 - 18:00:33 EST


On Mon, Jun 29, 2026 at 02:39:44PM -0700, Andi Kleen wrote:
> > We can maybe batch a bunch of these, and do stop machine once?
>
> If you add a long delay you increase the risk that the machine or
> the critical process dies because there is an unrecoverable or
> process killing AR access before the page can be off lined.

Well nothing prevents us from opportunistically running the MF machinery
once first of all.

No regression then?

We then add it to a batch and once the batch is full we do stop machine
and reprocess them again.

Hmm?




>
> (BTW that's a problem even with the RCU approach, the cure might be
> worse than the disease)
>
> If you don't add a long delay you would do a lot of stop machines
> on a flood, likely bringing it to a halt, even though other sockets etc.
> might still be fine.
>
> -Andi