Re: [PATCH 5/9] KGDB/KDB: add support for external NMI handler tocall KGDB/KDB.

From: Jason Wessel
Date: Fri Sep 06 2013 - 00:38:51 EST


On 09/05/2013 05:50 PM, Mike Travis wrote:
> This patch adds a kgdb_nmicallin() interface that can be used by
> external NMI handlers to call the KGDB/KDB handler. The primary need
> for this is for those types of NMI interrupts where all the CPUs
> have already received the NMI signal. Therefore no send_IPI(NMI)
> is required, and in fact it will cause a 2nd unhandled NMI to occur.
> This generates the "Dazed and Confuzed" messages.
>
> Since all the CPUs are getting the NMI at roughly the same time, it's not
> guaranteed that the first CPU that hits the NMI handler will manage to
> enter KGDB and set the dbg_master_lock before the slaves start entering.

It should have been ok to have more than one master if this was some kind of watch dog. The raw spin lock for the dbg_master_lock should have ensured that only a single CPU is in fact the master. If it is the case that we cannot send a nested IPI at this point, the UV machine type should have replaced the kgdb_roundup_cpus() routine with something that will work, such as looking at the exception type on the way in and perhaps skipping the IPI send.

Also if there is no possibility of restarting the machine from this state it would have been possible to simply turn off kgdb_do_roundup in the custom kgdb_roundup_cpus().

The patch you created appears that it will work, but it comes at the cost of some complexity because you are also checking on the state of "kgdb_info[cpu].send_ready" in some other location in the NMI handler. It might be better to consider not sending a nested NMI if all the CPUs are going to enter anyway in the master state.


>
> The new argument "send_ready" was added for KGDB to signal the NMI handler
> to release the slave CPUs for entry into KGDB.
>
> Signed-off-by: Mike Travis <travis@xxxxxxx>
> Reviewed-by: Dimitri Sivanich <sivanich@xxxxxxx>
> Reviewed-by: Hedi Berriche <hberrich@xxxxxxx>
> ---
> include/linux/kgdb.h | 1 +
> kernel/debug/debug_core.c | 41 +++++++++++++++++++++++++++++++++++++++++
> kernel/debug/debug_core.h | 1 +
> 3 files changed, 43 insertions(+)
>
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -310,6 +310,7 @@ extern int
> kgdb_handle_exception(int ex_vector, int signo, int err_code,
> struct pt_regs *regs);
> extern int kgdb_nmicallback(int cpu, void *regs);
> +extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *snd_rdy);
> extern void gdbstub_exit(int status);
>
> extern int kgdb_single_step;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -578,6 +578,10 @@ return_normal:
> /* Signal the other CPUs to enter kgdb_wait() */
> if ((!kgdb_single_step) && kgdb_do_roundup)
> kgdb_roundup_cpus(flags);
> +
> + /* If optional send ready pointer, signal CPUs to proceed */
> + if (kgdb_info[cpu].send_ready)
> + atomic_set(kgdb_info[cpu].send_ready, 1);
> #endif
>
> /*
> @@ -729,6 +733,43 @@ int kgdb_nmicallback(int cpu, void *regs
> return 0;
> }
> #endif
> + return 1;
> +}
> +
> +int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
> +{
> +#ifdef CONFIG_SMP
> + if (!kgdb_io_ready(0))
> + return 1;
> +
> + if (kgdb_info[cpu].enter_kgdb == 0) {
> + struct kgdb_state kgdb_var;
> + struct kgdb_state *ks = &kgdb_var;
> + int save_kgdb_do_roundup = kgdb_do_roundup;
> +
> + memset(ks, 0, sizeof(struct kgdb_state));
> + ks->cpu = cpu;
> + ks->ex_vector = trapnr;
> + ks->signo = SIGTRAP;
> + ks->err_code = 0;
> + ks->kgdb_usethreadid = 0;
> + ks->linux_regs = regs;
> +
> + /* Do not broadcast NMI */
> + kgdb_do_roundup = 0;
> +
> + /* Indicate there are slaves waiting */
> + kgdb_info[cpu].send_ready = send_ready;
> + kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);

This is the one part of the patch I don't quite understand. Why does the kgdb_nmicallin() desire to be the master core?

It was not obvious the circumstance as to why this is called. Is it some kind of watch dog where you really do want to enter the debugger or is it more to deal with nested slave interrupts were the round up would have possibly hung on this hardware. If it is the later, I would have thought this should be a slave and not the master.

Perhaps a comment in the code can clear this up?

Thanks,
Jason.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/