Re: [RFC PATCH 2/2] rcu,debug_core: allow the kernel debugger toreset the rcu stall timer

From: Paul E. McKenney
Date: Mon Aug 09 2010 - 13:44:23 EST


On Mon, Aug 09, 2010 at 12:12:12AM -0500, Jason Wessel wrote:
> When returning from the kernel debugger allow a reset of the rcu
> jiffies_stall value to prevent the rcu stall detector from sending NMI
> events which stack dumps on all the cpus in the system.

Thank you for forwarding this!

A couple of questions below.

> Signed-off-by: Jason Wessel <jason.wessel@xxxxxxxxxxxxx>
> CC: Dipankar Sarma <dipankar@xxxxxxxxxx>
> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxx>
> ---
> include/linux/rcupdate.h | 8 ++++++++
> kernel/debug/debug_core.c | 2 ++
> kernel/rcutree.c | 9 +++++++++
> 3 files changed, 19 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 9fbc54a..abd3ab6 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -599,4 +599,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
> #define rcu_dereference_index_check(p, c) \
> __rcu_dereference_index_check((p), (c))
>
> +#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> +extern void rcu_cpu_stall_reset(void);
> +#else /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
> +static inline void rcu_cpu_stall_reset(void)
> +{
> +}
> +#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
> +
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index e4d6819..1600e90 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -47,6 +47,7 @@
> #include <linux/pid.h>
> #include <linux/smp.h>
> #include <linux/mm.h>
> +#include <linux/rcupdate.h>
>
> #include <asm/cacheflush.h>
> #include <asm/byteorder.h>
> @@ -474,6 +475,7 @@ static void dbg_touch_watchdogs(void)
> {
> touch_softlockup_watchdog_sync();
> clocksource_touch_watchdog();
> + rcu_cpu_stall_reset();
> }
>
> static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs)
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index d5bc439..209b755 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -532,6 +532,9 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>
> if (rcu_cpu_stall_panicking)
> return;
> + /* Gracefully handle a watch dog reset when jiffies_stall == 0 */
> + if (!rsp->jiffies_stall)
> + return;
> delta = jiffies - rsp->jiffies_stall;
> rnp = rdp->mynode;
> if ((rnp->qsmask & rdp->grpmask) && delta >= 0) {
> @@ -561,6 +564,12 @@ static void __init check_cpu_stall_init(void)
> atomic_notifier_chain_register(&panic_notifier_list, &rcu_panic_block);
> }
>
> +void rcu_cpu_stall_reset(void)
> +{
> + rcu_sched_state.jiffies_stall = 0;
> + rcu_bh_state.jiffies_stall = 0;
> +}
> +

OK, so you are suppressing RCU CPU stall warnings for rcu_sched and
rcu_bh, but not for preemptible RCU. I believe that you want all of
them covered.

I have a number of recent patches that allow RCU CPU stall warnings to
be suppressed, one of which allows them to be suppressed using sysfs.
Would that work for you, or do you need an in-kernel interface?

If you do need an in-kernel interface, I could export (and probably
rename) rcu_panic(), which is a static in 2.6.35. This assumes that you
never want to re-enable RCU CPU stall warnings once you suppress them,
which is what your patch appears to do.

So, if I export a suppress_rcu_cpu_stall() function that permanently
disabled RCU CPU stall warnings, would that work for you? (They could
be manually re-enabled via sysfs.)

> #else /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
>
> static void record_gp_stall_check_time(struct rcu_state *rsp)
> --
> 1.6.3.3
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/