Re: [PATCH] kgdb: Fix broken handling of printk() in NMI context

From: Sumit Garg
Date: Fri May 15 2020 - 01:47:14 EST


Hi Petr,

On Thu, 14 May 2020 at 14:13, Petr Mladek <pmladek@xxxxxxxx> wrote:
>
> On Wed 2020-05-13 19:04:48, Sumit Garg wrote:
> > On Tue, 12 May 2020 at 19:55, Daniel Thompson
> > <daniel.thompson@xxxxxxxxxx> wrote:
> > >
> > > On Tue, May 12, 2020 at 02:18:34PM +0530, Sumit Garg wrote:
> > > > Since commit 42a0bb3f7138 ("printk/nmi: generic solution for safe printk
> > > > in NMI"), kgdb entry in NMI context defaults to use safe NMI printk()
> > >
> > > I didn't see the author on Cc: nor any of the folks whose hands it
> > > passed through. It would definitely be good to involve them in this
> > > discussion.
> > >
> >
> > Thanks for updating the Cc: list.
> >
> > >
> > > > which involves CPU specific buffers and deferred printk() until exit from
> > > > NMI context.
> > > >
> > > > But kgdb being a stop-the-world debugger, we don't want to defer printk()
> > > > especially backtrace on corresponding CPUs. So instead switch to normal
> > > > printk() mode in kgdb_cpu_enter() if entry is in NMI context.
> > >
> > > So, firstly I should *definitely* take a mea cupla for not shouting
> > > about this at the time (I was on Cc:... twice). Only thing I can say
> > > confidently is that the test suite didn't yell about this and so I
> > > didn't look at this as closely as I should have done (and that it
> > > didn't yell is mostly because I'm still building out the test suite
> > > coverage).
> > >
> > > Anyhow...
> > >
> > > This feels a little like we are smearing the printk() interception logic
> > > across the kernel in ways that make things hard to read. If we accepted
> > > this patch we then have, the new NMI interception logic, the old kdb
> > > interception logic and some hacks in the kgdb trap handler to defang the
> > > NMI interception logic and force the kdb logic to kick in.
> > >
> > > Wouldn't it be better to migrate kdb interception logic up a couple of
> > > levels so that it continues to function even when we are in nmi printk
> > > mode. That way *all* the printk() interception code would end up in
> > > one place.
> > >
> >
> > Yes it would be better to have all printk() interception code at one
> > place. Let me see if I can come up with an integrated logic.
>
> It might be enough to move the kdb_check from vprintk_default()
> to vprintk_func().
>
> I have never used kdb. I did not know that it was able to stop
> kernel in any context.
>
> Would this work? It is only compile tested!

Thanks for this fix patch. It did resolve the issue and now I am able
to see the backtrace in the kdb shell. So we can go ahead with your
patch and drop mine.

>
> From 14ae6c9f0cbd1479cb898c864c7ab46e20f3cf6f Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@xxxxxxxx>
> Date: Thu, 14 May 2020 10:37:44 +0200
> Subject: [PATCH] printk/kdb: Redirect printk messages into kdb in any context
>
> kdb is able to stop kernel even in NMI context where printk() is redirected
> to the printk_safe() lockless variant. Move the check and redirect to kdb
> even in this case.
>
> Signed-off-by: Petr Mladek <pmladek@xxxxxxxx>
> ---
> kernel/printk/printk.c | 14 +-------------
> kernel/printk/printk_safe.c | 8 ++++++++
> 2 files changed, 9 insertions(+), 13 deletions(-)
>

Reported-by: Sumit Garg <sumit.garg@xxxxxxxxxx>
Tested-by: Sumit Garg <sumit.garg@xxxxxxxxxx>

-Sumit

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 9a9b6156270b..63a1aa377cd9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -35,7 +35,6 @@
> #include <linux/memblock.h>
> #include <linux/syscalls.h>
> #include <linux/crash_core.h>
> -#include <linux/kdb.h>
> #include <linux/ratelimit.h>
> #include <linux/kmsg_dump.h>
> #include <linux/syslog.h>
> @@ -2036,18 +2035,7 @@ EXPORT_SYMBOL(vprintk);
>
> int vprintk_default(const char *fmt, va_list args)
> {
> - int r;
> -
> -#ifdef CONFIG_KGDB_KDB
> - /* Allow to pass printk() to kdb but avoid a recursion. */
> - if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
> - r = vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> - return r;
> - }
> -#endif
> - r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
> -
> - return r;
> + return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
> }
> EXPORT_SYMBOL_GPL(vprintk_default);
>
> diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
> index d9a659a686f3..81734497c625 100644
> --- a/kernel/printk/printk_safe.c
> +++ b/kernel/printk/printk_safe.c
> @@ -6,6 +6,7 @@
> #include <linux/preempt.h>
> #include <linux/spinlock.h>
> #include <linux/debug_locks.h>
> +#include <linux/kdb.h>
> #include <linux/smp.h>
> #include <linux/cpumask.h>
> #include <linux/irq_work.h>
> @@ -359,6 +360,13 @@ void __printk_safe_exit(void)
>
> __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
> {
> +#ifdef CONFIG_KGDB_KDB
> + /* Allow to pass printk() to kdb but avoid a recursion. */
> + if (unlikely(kdb_trap_printk && kdb_printf_cpu < 0)) {
> + return vkdb_printf(KDB_MSGSRC_PRINTK, fmt, args);
> + }
> +#endif
> +
> /*
> * Try to use the main logbuf even in NMI. But avoid calling console
> * drivers that might have their own locks.
> --
> 2.26.1
>