Re: [PATCH] printk/nmi: Prevent deadlock when serializing NMI backtraces

From: Petr Mladek
Date: Wed Jun 06 2018 - 07:16:05 EST

On Wed 2018-06-06 14:10:29, Sergey Senozhatsky wrote:
> On (06/05/18 14:47), Petr Mladek wrote:
> [..]
> > Grr, the ABBA deadlock is still there. NMIs are not sent to the other
> > CPUs atomically. Even if we detect that logbuf_lock is available
> > in printk_nmi_enter() on some CPUs, it might still get locked on
> > another CPU before the other CPU gets NMI.
> Can we do something about "B"? :) I mean - any chance we can rework
> locking in nmi_cpu_backtrace()?

I can't think of any possibility at the moment. Well, it does not mean
that it does not exist.

The irony is that we need the extra lock in nmi_cpu_backtrace() only
because we try to store the messages into the common log buffer.
If we always use the per-CPU buffers in NMI then the lock would
not cause any problems but it also won't be necessary.

> > By other words, any check in printk_safe_enter() is racy and not
> > sufficient
> I suppose you meant printk_nmi_enter().

Yup, I am sorry for confusion.

> > => I suggest to revert the commit 719f6a7040f1bdaf96fcc70
> > "printk: Use the main logbuf in NMI when logbuf_lock is available"
> > for-4.18 and stable until we get a better solution.
> Just random thoughts.
> May be we need to revert it, but let's not "panic". I think [but don't
> insist on it] that the patch in question is *probably* "good enough". It
> addresses a bug report after all.

It was a problem reported by me. I found it when testing other changes.
The patch improved the situation definitely. The question is if it is
enough in practice.

> How often do we have arch_trigger_cpumask_backtrace() on all CPUs these
> days? I tend to think that it used to be much more popular in the past,
> because we had a loops_per_jiffy based spin_lock lockup detection which
> would trigger NMI backtracase, but this functionality has gone, see
> bc88c10d7e6900916f5e1ba3829d66a9de92b633 for details.

AFAIK, we (SUSE) started to work on the printk/NMI deadlocks because
it was too risky to use sysrq-l on systems with many CPUs. It means that
the use case is still there.

Hmm, the original problem was that any interrupted logbuf_lock owner
caused the deadlock. It is _different now_ because the interrupted
logbuf_lock owner must block another CPU via the lock in

I actually wonder why this patch actually helped so much in my test
(while true; do echo l >/proc/sysrq-trigger; done). My guess is
that it increased a lot the number of CPUs using PRINTK_NMI_CONTEXT
(per-CPU buffers). Therefore the deadlock was less likely.
On the other hand, the original problem was basically back
without this patch.

> I'm not saying that the race condition that you found is
> unrealistic, I'm just saying that _it seems_ that
> nmi_panic()->printk() on a single CPU is more common
> now, so having that nmi_printk()->printk_deferred() might be quite
> valuable at the end of the day.

I see the point. I need to think about it.

> May be I'm wrong!
> > The only safe solution seems to be a trylock() in NMI in
> > vprintk_emit() and fallback to vprintk_safe() when the lock
> > is not taken. I mean something like:
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 247808333ba4..4a5a0bf221b3 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1845,7 +1845,13 @@ asmlinkage int vprintk_emit(int facility, int level,
> > printk_delay();
> >
> > /* This stops the holder of console_sem just where we want him */
> > - logbuf_lock_irqsave(flags);
> > + printk_safe_enter_irqsave(flags);
> > + if (in_nmi() && !raw_spin_trylock(&logbuf_lock)) {
> > + vprintk_nmi(fmt, args);
> > + printk_safe_exit_irqrestore(flags);
> > + return;
> > + } else
> > + raw_spin_lock(&logbuf_lock);
> > /*
> > * The printf needs to come first; we need the syslog
> > * prefix which might be passed-in as a parameter.
> I need some time to think about it.

The bad thing about this solution is that different messages might
be handled different way during a single nmi_cpu_backtrace().
The result might be mixed backtraces in the log.

> > Sigh, this looks like a material for-4.19.
> Agreed.
> > We might need to revisit if printk_context still makes sense, ...

> What do you mean by this?

My dream was to select the right vprintk implementation according
to printk_context on a single place (vprintk_func()). If this is not
usable for NMI, we might need to think about another solution.
But this is premature. I believe the printk_context will most
likely make still sense.

I need to think more about it.

Best Regards,