[PATCH] tracing: Have stack tracer call rcu_nmi_enter() only when RCU is not watching

From: Steven Rostedt
Date: Thu Sep 21 2017 - 13:21:50 EST



Testing some other code, this RCU splat triggered:

=============================
WARNING: suspicious RCU usage
4.13.0-rc7-test+ #117 Tainted: G W
-----------------------------
arch/x86/kernel/traps.c:305 entry code didn't wake RCU!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/1/0.

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 4.13.0-rc7-test+ #117
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x86/0xcf
lockdep_rcu_suspicious+0xc5/0x100
do_error_trap+0x125/0x130
? do_error_trap+0x5/0x130
? trace_hardirqs_off_thunk+0x1a/0x1c
? do_invalid_op+0x5/0x30
do_invalid_op+0x20/0x30
invalid_op+0x1e/0x30
RIP: 0010:module_assert_mutex_or_preempt+0x34/0x40
RSP: 0018:ffffc900006abc58 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffffa000a077 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 0000000000000046
RBP: ffffc900006abc58 R08: ffffc900006abf40 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8801188d8040 R15: ffffffff81ed5720
? 0xffffffffa000a077
? module_assert_mutex_or_preempt+0x30/0x40
__module_address+0x2c/0xf0
? 0xffffffffa000a077
__module_text_address+0x12/0x60
? 0xffffffffa000a077
is_module_text_address+0x1f/0x50
? 0xffffffffa000a077
__kernel_text_address+0x30/0x90
unwind_get_return_address+0x1f/0x30
__save_stack_trace+0x83/0xd0
? 0xffffffffa000a077
? rcu_dynticks_eqs_exit+0x5/0x40
save_stack_trace+0x1b/0x20
check_stack+0xf8/0x2f0
? rcu_dynticks_eqs_enter+0x30/0x30
stack_trace_call+0x6e/0x80
0xffffffffa000a077
? ftrace_graph_caller+0x78/0xa8
? rcu_dynticks_eqs_exit+0x5/0x40
rcu_dynticks_eqs_exit+0x5/0x40
rcu_idle_exit+0xdf/0xf0
? rcu_dynticks_eqs_exit+0x5/0x40
? rcu_idle_exit+0xdf/0xf0
do_idle+0x128/0x200
cpu_startup_entry+0x1d/0x20
start_secondary+0x108/0x130
secondary_startup_64+0x9f/0x9f

What happened, was that the CPU was coming out of idle, and
rcu_dynticks_eqs_exit() was traced by the stack tracer. The stack was bigger
than the current recorded max stack so the stack tracer triggered a stack
trace calling save_stack_trace(). As save_stack_trace() performs RCU logic,
the stack tracer called rcu_irq_enter() before calling save_stack_trace().
There is two problems here.

1) rcu_irq_enter() is called unconditionally

Talking with Paul McKenney, it should only be called if rcu_is_watching()
returns false.

2) rcu_irq_enter() appears to not function well in this location of the RCU
internal logic (tracing rcu_dynticks_eqs_exit()), and causes this bug.

The solution is two fold.

1) check if RCU is watching, and only wake RCU if it is not
(solves the first issue)

2) call rcu_nmi_enter() instead of rcu_irq_enter(). It makes more sense as
tracing acts more like NMIs and not IRQs (can be called when interrupts
are disabled. (solves the second issue).

Suggested-by: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
---
kernel/trace/trace_stack.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index a4df67c..2462e11 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -77,6 +77,7 @@ check_stack(unsigned long ip, unsigned long *stack)
{
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
+ bool no_rcu;
int frame_size = ACCESS_ONCE(tracer_frame);
int i, x;

@@ -104,14 +105,18 @@ check_stack(unsigned long ip, unsigned long *stack)
if (unlikely(rcu_irq_enter_disabled()))
return;

+ no_rcu = !rcu_is_watching();
+
local_irq_save(flags);
arch_spin_lock(&stack_trace_max_lock);

/*
* RCU may not be watching, make it see us.
* The stack trace code uses rcu_sched.
+ * Note, tracing acts more like an NMI than an IRQ.
*/
- rcu_irq_enter();
+ if (no_rcu)
+ rcu_nmi_enter();

/* In case another CPU set the tracer_frame on us */
if (unlikely(!frame_size))
@@ -205,7 +210,8 @@ check_stack(unsigned long ip, unsigned long *stack)
}

out:
- rcu_irq_exit();
+ if (no_rcu)
+ rcu_nmi_exit();
arch_spin_unlock(&stack_trace_max_lock);
local_irq_restore(flags);
}
--
2.9.5