Re: [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master

From: Daniel Thompson
Date: Fri Aug 30 2019 - 10:52:45 EST


On Wed, Jul 31, 2019 at 11:37:32AM -0700, Douglas Anderson wrote:
> In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
> give you the right info. Specifically on many architectures
> (including arm64, where I tested) you can't dump the stack of a
> "running" process that isn't the process running on the current CPU.
> This can be seen by this:
>
> echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> # wait 2 seconds
> <sysrq>g
>
> Here's what I see now on rk3399-gru-kevin. I see the stack crawl for
> the CPU that handled the sysrq but everything else just shows me stuck
> in __switch_to() which is bogus:
>
> ======
>
> [0]kdb> btc
> btc: cpu status: Currently on cpu 0
> Available cpus: 0, 1-3(I), 4, 5(I)
> Stack traceback for pid 0
> 0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0
> Call trace:
> dump_backtrace+0x0/0x138
> ...
> kgdb_compiled_brk_fn+0x34/0x44
> ...
> sysrq_handle_dbg+0x34/0x5c
> Stack traceback for pid 0
> 0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1
> Call trace:
> __switch_to+0x1e4/0x240
> 0xffffffc0f65616c0
> Stack traceback for pid 0
> 0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2
> Call trace:
> __switch_to+0x1e4/0x240
> 0xffffffc0f65806c0
> Stack traceback for pid 0
> 0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
> Call trace:
> __switch_to+0x1e4/0x240
> 0xffffffc0f659f6c0
> Stack traceback for pid 1474
> 0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash
> Call trace:
> __switch_to+0x1e4/0x240
> __schedule+0x464/0x618
> 0xffffffc0dde8b040
> Stack traceback for pid 0
> 0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5
> Call trace:
> __switch_to+0x1e4/0x240
> 0xffffffc0f65dd6c0
>
> ===
>
> The problem is that 'btc' eventually boils down to
> show_stack(task_struct, NULL);
>
> ...and show_stack() doesn't work for "running" CPUs because their
> registers haven't been stashed.
>
> On x86 things might work better (I haven't tested) because kdb has a
> special case for x86 in kdb_show_stack() where it passes the stack
> pointer to show_stack(). This wouldn't work on arm64 where the stack
> crawling function seems needs the "fp" and "pc", not the "sp" which is
> presumably why arm64's show_stack() function totally ignores the "sp"
> parameter.
>
> NOTE: we _can_ get a good stack dump for all the cpus if we manually
> switch each one to the kdb master and do a back trace. AKA:
> cpu 4
> bt
> ...will give the expected trace. That's because now arm64's
> dump_backtrace will now see that "tsk == current" and go through a
> different path.
>
> In this patch I fix the problems by catching a request to stack crawl
> a task that's running on a CPU and then I ask that CPU to do the stack
> crawl.
>
> NOTE: this will (presumably) change what stack crawls are printed for
> x86 machines. Now kdb functions will show up in the stack crawl.
> Presumably this is OK but if it's not we can go back and add a special
> case for x86 again.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>

I think this approach can be made work but there are problems as things
exist today, see below.


> ---
>
> Changes in v2:
> - Totally new approach; now arch agnostic.
>
> kernel/debug/debug_core.c | 5 +++++
> kernel/debug/debug_core.h | 1 +
> kernel/debug/kdb/kdb_bt.c | 44 ++++++++++++++++++++++++++++++---------
> 3 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 5cc608de6883..a89c72714fe6 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -92,6 +92,8 @@ static int kgdb_use_con;
> bool dbg_is_early = true;
> /* Next cpu to become the master debug core */
> int dbg_switch_cpu;
> +/* cpu number of slave we request a stack crawl of */
> +int dbg_slave_dumpstack_cpu = -1;
>
> /* Use kdb or gdbserver mode */
> int dbg_kdb_mode = 1;
> @@ -580,6 +582,9 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> atomic_xchg(&kgdb_active, cpu);
> break;
> }
> + } else if (dbg_slave_dumpstack_cpu == cpu) {

Couldn't this be encoded in the exception state?


> + dump_stack();
> + dbg_slave_dumpstack_cpu = -1;

> } else if (kgdb_info[cpu].exception_state & DCPU_IS_SLAVE) {
> if (!raw_spin_is_locked(&dbg_slave_lock))
> goto return_normal;
> diff --git a/kernel/debug/debug_core.h b/kernel/debug/debug_core.h
> index b4a7c326d546..dca74d5caef2 100644
> --- a/kernel/debug/debug_core.h
> +++ b/kernel/debug/debug_core.h
> @@ -62,6 +62,7 @@ extern int dbg_io_get_char(void);
> /* Switch from one cpu to another */
> #define DBG_SWITCH_CPU_EVENT -123456
> extern int dbg_switch_cpu;
> +extern int dbg_slave_dumpstack_cpu;
>
> /* gdbstub interface functions */
> extern int gdb_serial_stub(struct kgdb_state *ks);
> diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> index 7e2379aa0a1e..10095ae05826 100644
> --- a/kernel/debug/kdb/kdb_bt.c
> +++ b/kernel/debug/kdb/kdb_bt.c
> @@ -10,6 +10,7 @@
> */
>
> #include <linux/ctype.h>
> +#include <linux/delay.h>
> #include <linux/string.h>
> #include <linux/kernel.h>
> #include <linux/sched/signal.h>
> @@ -22,20 +23,43 @@
> static void kdb_show_stack(struct task_struct *p, void *addr)
> {
> int old_lvl = console_loglevel;
> + int time_left;
> + int cpu;
> +
> console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
> kdb_trap_printk++;
> - kdb_set_current_task(p);
> - if (addr) {
> - show_stack((struct task_struct *)p, addr);
> - } else if (kdb_current_regs) {
> -#ifdef CONFIG_X86
> - show_stack(p, &kdb_current_regs->sp);
> -#else
> - show_stack(p, NULL);
> -#endif
> +
> + if (!addr && kdb_task_has_cpu(p)) {
> + cpu = kdb_process_cpu(p);
> +
> + if (cpu == raw_smp_processor_id()) {
> + dump_stack();
> + goto exit;

This goto is not for error recovery but looks like it is. Why can't we
use normal control flow here (extracting the remote stack dump logic
into a seperate function if the right margin is getting too close)?

In fact to be honest a function call would be useful anyway since I'd
rather have all the resulting horror in a single file (debug_core.c).


> + }
> +
> + /*
> + * In general architectures don't support dumping the stack
> + * of a "running" process that's not the current one so if
> + * we want to dump the stack of a running process that's not
> + * the master then we'll set a global letting the slave
> + * (which should be looping) know to dump its own stack.
> + */
> + dbg_slave_dumpstack_cpu = cpu;
> + for (time_left = MSEC_PER_SEC; time_left; time_left--) {
> + udelay(1000);
> + if (dbg_slave_dumpstack_cpu == -1)
> + break;
> + }

This timeout does not interact correctly with the pager (the timer does
not get reset when we sit in the pager loop waiting for user to tell us
to continue).


> + if (dbg_slave_dumpstack_cpu != -1) {
> + kdb_printf("ERROR: Timeout dumping CPU %d stack\n",
> + cpu);
> + dbg_slave_dumpstack_cpu = -1;
> + }
> } else {
> - show_stack(p, NULL);
> + show_stack(p, addr);
> }
> +
> +exit:
> console_loglevel = old_lvl;
> kdb_trap_printk--;
> }
> --
> 2.22.0.770.g0f2c4a37fd-goog