Re: [RFC PATCH 2/2] rcu,debug_core: allow the kernel debugger to reset the rcu stall timer

From: Paul E. McKenney
Date: Tue Jun 19 2018 - 17:17:01 EST


On Mon, Aug 09, 2010 at 12:12:12AM -0500, Jason Wessel wrote:
> When returning from the kernel debugger allow a reset of the rcu
> jiffies_stall value to prevent the rcu stall detector from sending NMI
> events which stack dumps on all the cpus in the system.

Not sure where the 2010 date came from, but it almost fooled me into
deleting your emails unread. ;-)

> Signed-off-by: Jason Wessel <jason.wessel@xxxxxxxxxxxxx>
> CC: Dipankar Sarma <dipankar@xxxxxxxxxx>
> CC: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> CC: Ingo Molnar <mingo@xxxxxxx>
> ---
> include/linux/rcupdate.h | 8 ++++++++
> kernel/debug/debug_core.c | 2 ++
> kernel/rcutree.c | 9 +++++++++
> 3 files changed, 19 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 9fbc54a..abd3ab6 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -599,4 +599,12 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
> #define rcu_dereference_index_check(p, c) \
> __rcu_dereference_index_check((p), (c))
>
> +#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> +extern void rcu_cpu_stall_reset(void);
> +#else /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
> +static inline void rcu_cpu_stall_reset(void)
> +{
> +}
> +#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
> +
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index e4d6819..1600e90 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -47,6 +47,7 @@
> #include <linux/pid.h>
> #include <linux/smp.h>
> #include <linux/mm.h>
> +#include <linux/rcupdate.h>
>
> #include <asm/cacheflush.h>
> #include <asm/byteorder.h>
> @@ -474,6 +475,7 @@ static void dbg_touch_watchdogs(void)
> {
> touch_softlockup_watchdog_sync();
> clocksource_touch_watchdog();
> + rcu_cpu_stall_reset();
> }
>
> static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs)
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index d5bc439..209b755 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -532,6 +532,9 @@ static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
>
> if (rcu_cpu_stall_panicking)
> return;
> + /* Gracefully handle a watch dog reset when jiffies_stall == 0 */
> + if (!rsp->jiffies_stall)
> + return;

Why not just use the existing rcu_cpu_stall_reset()? It sets the next
stall a long way into the future, like 2 billion jiffies on 32-bit
systems.

> delta = jiffies - rsp->jiffies_stall;
> rnp = rdp->mynode;
> if ((rnp->qsmask & rdp->grpmask) && delta >= 0) {
> @@ -561,6 +564,12 @@ static void __init check_cpu_stall_init(void)
> atomic_notifier_chain_register(&panic_notifier_list, &rcu_panic_block);
> }
>
> +void rcu_cpu_stall_reset(void)
> +{
> + rcu_sched_state.jiffies_stall = 0;
> + rcu_bh_state.jiffies_stall = 0;

This should get you a compiler warning given the existing
rcu_cpu_stall_reset(). It also fails to do anything about
rcu_preempt_state on PREEMPT=y kernels.

What happens if you just remove the rcutree.c changes from your
series and test with the result?

Thanx, Paul

> +}
> +
> #else /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
>
> static void record_gp_stall_check_time(struct rcu_state *rsp)
> --
> 1.6.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>