Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling

From: Josh Poimboeuf
Date: Wed Jan 22 2025 - 15:47:30 EST

Next message: Chris Packham: "Re: [PATCH v4 2/4] dt-bindings: mfd: Add MDIO interface to rtl9301-switch"
Previous message: Andreas Kemnade: "Re: [PATCH] ARM: dts: omap4-panda-a4: Add missing model and compatible properties"
In reply to: Peter Zijlstra: "Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling"
Next in thread: Peter Zijlstra: "Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Jan 22, 2025 at 01:28:21PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 21, 2025 at 06:30:53PM -0800, Josh Poimboeuf wrote:
> > It's possible for irq_work_queue() to fail if the work has already been
> > claimed. That can happen if a TWA_NMI_CURRENT task work is requested
> > before a previous TWA_NMI_CURRENT IRQ work on the same CPU has gotten a
> > chance to run.
>
> I'm confused, if it fails then it's already pending, and we'll get the
> notification already. You can still add the work.

Yeah, I suppose that makes sense. If the pending irq_work is already
going to set TIF_NOTIFY_RESUME anyway, there's no need to do that again.

> > The error has to be checked before the write to task->task_works. Also
> > the try_cmpxchg() loop isn't needed in NMI context. The TWA_NMI_CURRENT
> > case really is special, keep things simple by keeping its code all
> > together in one place.
>
> NMIs can nest,

Just for my understanding: for nested NMIs, the entry code basically
queues up the next NMI, so the C handler (exc_nmi) can't nest. Right?

> consider #DB (which is NMI like)

What exactly do you mean by "NMI like"? Is it because a #DB might be
basically running in NMI context, if the NMI hit a breakpoint?

> doing task_work_add() and getting interrupted with NMI doing the same.

How exactly would that work? At least with my patch the #DB wouldn't be
able to use TWA_NMI_CURRENT unless in_nmi() were true due to NMI hitting
a breakpoint. In which case a nested NMI wouldn't actually nest, it
would get "queued" by the entry code.

But yeah, I do see how the reverse can be true: somebody sets a
breakpoint in task_work, right where it's fiddling with the list head.
NMI calls task_work_add(TWA_NMI_CURRENT), triggering the #DB, which also
calls task_work_add().

--
Josh

Next message: Chris Packham: "Re: [PATCH v4 2/4] dt-bindings: mfd: Add MDIO interface to rtl9301-switch"
Previous message: Andreas Kemnade: "Re: [PATCH] ARM: dts: omap4-panda-a4: Add missing model and compatible properties"
In reply to: Peter Zijlstra: "Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling"
Next in thread: Peter Zijlstra: "Re: [PATCH v4 01/39] task_work: Fix TWA_NMI_CURRENT error handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]