Re: [PATCH] printk/nmi: restore printk_func in nmi_panic
From: Petr Mladek
Date: Mon Feb 29 2016 - 05:31:49 EST
On Sat 2016-02-27 11:19:44, Sergey Senozhatsky wrote:
> Hello Petr,
>
> On (02/26/16 15:57), Petr Mladek wrote:
> > On Fri 2016-02-26 12:37:20, Sergey Senozhatsky wrote:
> > > When watchdog detects a hardlockup and calls nmi_panic() `printk_func'
> > > must be restored via printk_nmi_exit() call, so panic() will be able
> > > to flush nmi buf and show backtrace and panic message. We also better
> > > explicitly ask nmi to printk_nmi_flush() in console_flush_on_panic(),
> > > because it may be too late to rely on irq work.
> > >
> > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> > > ---
> > > include/linux/kernel.h | 6 ++++--
> > > kernel/printk/printk.c | 1 +
> > > 2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index f4fa2b2..3ee33d5 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -469,10 +469,12 @@ do { \
> > > cpu = raw_smp_processor_id(); \
> > > old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu); \
> > > \
> > > - if (old_cpu == PANIC_CPU_INVALID) \
> > > + if (old_cpu == PANIC_CPU_INVALID) { \
> > > + printk_nmi_exit(); \
> >
> > This might end up in a deadlock that printk_nmi() wanted to avoid.
>
> aha, I see.
>
> > I think about a compromise. We should try to get the messages
> > out only when kdump is not enabled.
>
> can we zap_locks() if we are on nmi_panic()->panic()->console_flush_on_panic() path?
That is the problem. zap_locks() is not a solution.
First, it handles only lockbuf_lock and console_sem. There are other
locks used by particular consoles that might cause a deadlock.
Second, re-initializing locks is dangerous of its own. If they are
released by some other CPU that is still running, you might end up
in a deadlock because of a double release. In fact, I think that it
actually increases the risk. If there are more than 2 CPUs than
it is more likely that a printk is running on another CPU than
on the current one.
Peter Zijlstra had an idea of using early console in this case.
I am not sure but I guess that it does not have any internal locks.
But there is still the other problem with the double release.
I am afraid that the only solution is to make it configurable.
Some people might want to risk the deadlock and try to see the messages
on console. Others might rather want to get the crashdump for sure
with the cost that they will need to extract the NMI messages
from the per-CPU buffers.
Best Regards,
Petr