Re: [PATCH v3 4/4] kdb: Don't back trace on a cpu that didn't round up

From: Daniel Thompson
Date: Wed Nov 07 2018 - 07:30:41 EST


On Tue, Nov 06, 2018 at 05:00:28PM -0800, Douglas Anderson wrote:
> If you have a CPU that fails to round up and then run 'btc' you'll end
> up crashing in kdb becaue we dereferenced NULL. Let's add a check.
> It's wise to also set the task to NULL when leaving the debugger so
> that if we fail to round up on a later entry into the debugger we
> won't backtrace a stale task.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
>
> Changes in v3:
> - New for v3.
>
> Changes in v2: None
>
> kernel/debug/debug_core.c | 1 +
> kernel/debug/kdb/kdb_bt.c | 11 ++++++++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 324cba8917f1..08851077c20a 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -587,6 +587,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
> kgdb_info[cpu].exception_state &=
> ~(DCPU_WANT_MASTER | DCPU_IS_SLAVE);
> kgdb_info[cpu].enter_kgdb--;
> + kgdb_info[cpu].task = NULL;

This code isn't quite right.

In particular there are two exit paths from kgdb_cpu_enter() and this
code path is for slave exit only (and master may change the next time we
re-enter kgdb). It also looks very odd to have an unconditional clear
next to a decrement that takes account of debugger re-entrancy.

Note also that there is similar code in kdb_debugger.c (search for "zero
out any offline cpu data") which should be tidied up if we decide to do
the same clean up in a different way.

I'll leave it to you whether to fix the existing code or add new code
here but if you do add it in kgdb_cpu_enter() it must cover both return
paths, include debuggerinfo as well, and kdb_debugger.c needs to be
tidied up.


Daniel.


> smp_mb__before_atomic();
> atomic_dec(&slaves_in_kgdb);
> dbg_touch_watchdogs();
> diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> index 7921ae4fca8d..7e2379aa0a1e 100644
> --- a/kernel/debug/kdb/kdb_bt.c
> +++ b/kernel/debug/kdb/kdb_bt.c
> @@ -186,7 +186,16 @@ kdb_bt(int argc, const char **argv)
> kdb_printf("btc: cpu status: ");
> kdb_parse("cpu\n");
> for_each_online_cpu(cpu) {
> - sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
> + void *kdb_tsk = KDB_TSK(cpu);
> +
> + /* If a CPU failed to round up we could be here */
> + if (!kdb_tsk) {
> + kdb_printf("WARNING: no task for cpu %ld\n",
> + cpu);
> + continue;
> + }
> +
> + sprintf(buf, "btt 0x%px\n", kdb_tsk);
> kdb_parse(buf);
> touch_nmi_watchdog();
> }
> --
> 2.19.1.930.g4563a0d9d0-goog
>