Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl

From: Rik van Riel
Date: Wed Jul 24 2024 - 12:56:17 EST


On Wed, 2024-07-24 at 14:56 +0200, Petr Mladek wrote:
> On Thu 2024-07-18 16:15:43, John Ogness wrote:
>
> >
> > However, my first suggestion to defer whenever the cpu_sync is held
> > really is the only option because console_unlock() will spin on the
> > uart
> > port lock, and that is also not allowed when holding the cpu_sync.
>
> It would have helped if Rick added backtraces from the crash dumps.
> He just wrote:
>
I would have preferred that, as well.

However, this deadlock prevent us from capturing a good backtrace
of the CPU that was stuck on the lock while in the NMI handler!

We ended up having to dig through the NMI stack of the stuck CPU
by hand, looking for anything that might be a function address,
and guess at what the CPU was doing.

This deadlock not only prevented the CPU printing, but also from
dumping its register state at panic time before the kdump kernel
was kexeced.

--
All Rights Reversed.