Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

From: Petr Mladek
Date: Fri Jun 08 2018 - 06:48:33 EST


On Wed 2018-06-06 19:33:56, Sergey Senozhatsky wrote:
> On (06/06/18 14:10), Sergey Senozhatsky wrote:
> > On (06/05/18 14:47), Petr Mladek wrote:
> > [..]
> > > Grr, the ABBA deadlock is still there. NMIs are not sent to the other
> > > CPUs atomically. Even if we detect that logbuf_lock is available
> > > in printk_nmi_enter() on some CPUs, it might still get locked on
> > > another CPU before the other CPU gets NMI.
> >
> > Can we do something about "B"? :) I mean - any chance we can rework
> > locking in nmi_cpu_backtrace()?
>
> Sorry, I don't have that much free time at the moment, so can't
> fully concentrate on this issue.
>
> Here is a quick-n-dirty thought.
>
> The whole idea of printk_nmi() was to reduce the number of locks
> we take performing printk() from NMI context - ideally down to 0.
> We added logbuf spin_lock later on. One lock was fine. But then
> we added another one - nmi_cpu_backtrace() lock. And two locks is
> too many. So can we drop the nmi_cpu_backtrace() lock, and in
> exchange extend printk-safe API with functions that will disable/enable
> PRINTK_NMI_DEFERRED_CONTEXT_MASK on a particular CPU?

I ended with similar conclusion. I am just nervous by the fact that
the check in printk_nmi_enter() will always be unreliable. We already
deal with situations when we want to check the actual lock state
in panic(). Also the lock in nmi_cpu_backtrace() need not be
the only lock serializing NMIs. I am always surprised what code
can be called in NMI.

I played with this a bit and came up with the following: