Re: [PATCH] printk/kdb: Redirect printk messages into kdb in any context

From: Daniel Thompson
Date: Wed May 20 2020 - 05:36:06 EST


On Wed, May 20, 2020 at 01:21:02PM +0900, Sergey Senozhatsky wrote:
> On (20/05/18 11:21), Petr Mladek wrote:
> [..]
> > > > Is this guaranteed that we never execute this path from NMI?
> >
> > Good question!
> >
> > > Absolutely not.
> > >
> > > The execution context for kdb is pretty much unique... we are running a
> > > debug mode with all CPUs parked in a holding loop with interrupts
> > > disabled. One CPU is at an unknown exception state and the others are
> > > either handling an IRQ or NMI depending on architecture[1].
> >
> > This is similar to the situation in panic() when other CPUs are
> > stopped. It is more safe when the CPUs are stopped using IRQ.
> > There is higher danger of a deadlock when NMI is used.
> >
> > bust_spinlock() is used in panic() to increase the chance to go over
> > the deadlock and actually see the messages. It is not enough when
> > more locks are used by the console (VT/TTY is good example). And
> > it is not guaranteed that the console will still work after
> > the hack is disabled by bust_spinlocks(0).
>
> Good point. It's not guaranteed to help, but bust_spinlocks() does
> help in general, many serial drivers do check oops_in_progress and
> use a deadlock safe approach when locking port lock. I don't see
> bust_spinlocks() being used in kdb, so it probably better start
> doing so (along with general for_each_console() loop improvements,
> like checking if console is enabled/available/etc).

Agree.

I am also interested in whether we can figure out a way to match
consoles and polling drivers. It is better to use the polling
driver rather than the console since the polling drivers "know"
they will execute from all sorts of crazy places. For the most common
use cases this would also result in no console handler ever being
called.

BTW for those asking how this issue an submarine for so long I think
the main factor is that not all architectures implement NMI.

There are exceptions but kdb is typically only useful when either:

1. We have a real (e.g. not USB) serial port, or
2. We have a real (e.g. not USB) keyboard

On x86, where the SMP peers are rounded up using using an NMI, the
above grows increasingly hard to find (although they are certainly
still here). Combined with this very few commonly embedded
architectures round up using an NMI so these problems cannot occur.

This has allowed kdb to fall rather far behind the rest of the kernel
w.r.t. NMI robustness whilst not being entirely useless.

Sumit's recent work to exploit NMIs on arm64 is bringing some of our
debt to the surface... happily I think that will also help us to tackle
it!


> [..]
> > > > If so, can this please be added to the commit message? A more
> > > > detailed commit message will help a lot.
> >
> > What about?
> >
> > "KDB has to get messages on consoles even when the system is stopped.
> > It uses kdb_printf() internally and calls console drivers on its own.
> >
> > It uses a hack to reuse an existing code. It sets "kdb_trap_printk"
> > global variable to redirect even the normal printk() into the
> > kdb_printf() variant.
> >
> > The variable "kdb_trap_printk" is checked in printk_default() and
> > it is ignored when printk is redirected to printk_safe in NMI context.
> > Solve this by moving the check into printk_func().

Maybe a "Currently" thrown in we switch from general background
information to describing the problem the patch is about to fix helps us
read it:

Currently the variable "kdb_trap_printk" is checked

But other than that LGTM.


Daniel.

> >
> > It is obvious that it is not fully safe. But it does not make things
> > worse. The console drivers are already called in this context by
> > kdb_printf() direct calls."
>
> This looks more informative indeed. Thanks!