Re: Question about qspinlock nest

From: Peter Zijlstra
Date: Fri Jan 18 2019 - 05:02:52 EST


On Mon, Jan 14, 2019 at 01:54:49PM +0000, James Morse wrote:
> On 14/01/2019 13:16, Peter Zijlstra wrote:

> > What avoids the trivial self-recursion:
> >
> > spin_lock(&)
> > <NMI>
> > spin_lock(&x)
> > ... wait forever more ...
> > </NMI>
> > spin_unlock(&x)
> >
> > ?
>
> If its trying to take the same lock, I agree its deadlocked.
> If the sequence above started with <NMI>, I agree its deadlocked.
>
> APEI/GHES is doing neither of these things. It take a lock that is only ever
> taken in_nmi(). nmi_enter()s BUG_ON(in_nmi()) means these never become re-entrant.

Urgh.. yes. I abhor that spinlock usage, but you're correct. If they're
only ever used in the NMI then it ought to work.

/me digs around... Bugger we have more like that :/

> What is the lock doing? Protecting the 'NMI' fixmap slot in the unlikely case
> that two CPUs end up in here at the same time.
>
> (I though x86's NMI masked NMI until the next iret?)

Correct; x86 has his 'feature' where IRET will unmask the NMI, so we
have something quite terrible to deal with that, don't ask and I shall
not have to explain :-)

> This is murkier on arm64 as we have multiple things that behave like this, but
> there is an order to them, and none of them can interrupt themselves.

Well, x86 too has multiple non-maskable vectors, and afaik only the
actual NMI vector is covered in tricky. But our MCE vector is
non-maskable too (and I have vague memories of there being more).

Boris, Rostedt, WTH happens if our MCE code goes and hits a #BP ? (not
unlikely with this proliferation of self-modifying code)

Anyway, the idea is that they can indeed not interrupt themselves, but I
would not be surprised if the whole MCE thing is riddled with fail (on
x86).

> e.g. We can't take an SError during the SError handler.
>
> But we can take this SError/NMI on another CPU while the first one is still
> running the handler.
>
> These multiple NMIlike notifications mean having multiple locks/fixmap-slots,
> one per notification. This is where the qspinlock node limit comes in, as we
> could have more than 4 contexts.

Right; so Waiman was going to do a patch that reverts to test-and-set or
something along those lines once we hit the queue limit, which seems
like a good way out. Actually hitting that nesting level should be
exceedingly rare.