Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt

From: Paul E. McKenney
Date: Wed Apr 05 2017 - 12:26:12 EST


On Wed, Apr 05, 2017 at 09:32:07AM -0400, Steven Rostedt wrote:
> Hi Paul,
>
> My tests are triggering the following when I have lockdep enabled and
> running the stack tracer:
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51
> Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample]
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
>
> ===============================
> [ ERR: suspicious RCU usage. ]
> 4.11.0-rc3-test+ #48 Not tainted
> -------------------------------
> /work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle!
>
> other info that might help us debug this:
>
>
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 2, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
> #0: (rcu_read_lock){......}, at: [<ffffffff810ebee3>] rcu_read_lock+0x0/0x61
>
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
> dump_stack+0x68/0x92
> __warn+0xc2/0xdd
> ? 0xffffffffa0026077
> warn_slowpath_null+0x1d/0x1f
> module_assert_mutex_or_preempt+0x4f/0x51
> __module_address+0x2a/0xac
> ? 0xffffffffa0026077
> __module_text_address+0x12/0x59
> ? 0xffffffffa0026077
> is_module_text_address+0xe/0x16
> __kernel_text_address+0x2b/0x80
> ? 0xffffffffa0026077
> unwind_get_return_address+0x50/0x5c
> __save_stack_trace+0x76/0xbf
> ? rcu_eqs_enter_common.constprop.71+0x5/0x108
> save_stack_trace+0x1b/0x1d
> check_stack+0xec/0x24a
> stack_trace_call+0x40/0x53 <<-- This is the start of the stack tracer
> 0xffffffffa0026077
> ? ftrace_graph_caller+0x78/0xa8
> ? trace_hardirqs_off+0xd/0xf
> ? rcu_eqs_enter_common.constprop.71+0x5/0x108
> rcu_eqs_enter_common.constprop.71+0x5/0x108 <<-- It appears to be testing the rcu internal code
> rcu_idle_enter+0x51/0x72
> ? rcu_eqs_enter_common.constprop.71+0x5/0x108
> ? rcu_idle_enter+0x51/0x72
> do_idle+0xbb/0x1e5
> cpu_startup_entry+0x1f/0x21
> rest_init+0x127/0x12d
> start_kernel+0x3f5/0x402
> x86_64_start_reservations+0x2f/0x31
> x86_64_start_kernel+0x17f/0x192
> start_cpu+0x14/0x14
> ? start_cpu+0x14/0x14
> ---[ end trace eeaad31bc52457e7 ]---
>
>
> Now the check_stack() function in kernel/trace/trace_stack.c has the
> following:
>
> /*
> * RCU may not be watching, make it see us.
> * The stack trace code uses rcu_sched.
> */
> rcu_irq_enter();
>
>
> What I'm guessing, is that because it is tracing the internals of the
> rcu code, and checking its stack, the stack trace triggered in a
> location where calling rcu_irq_enter() isn't sufficient to have rcu
> become watchable.
>
> I may add code to check if rcu is indeed watching after the
> rcu_irq_enter() call, and if it still isn't to simply exit out.
>
> Thoughts?

Ouch!!!

I really blew it telling you that you can use rcu_irq_enter()!!!

RCU's eqs code disables interrupts, but has a call to the trace code
exactly where it cannot handle an interrupt. Your rcu_irq_enter()
therefore breaks things -- the ->dynticks_nesting has been set to zero,
but ->dynticks still indicates non-idle. Therefore, rcu_irq_enter() sees
zero nesting, and thus incorrectly increments ->dynticks, which tells RCU
"idle". The later rcu_is_watching() checks therefore complain bitterly.

I really don't want you to have to hack around this. So I should adjust
RCU to avoid doing tracing during this period of time, which would allow
you to continue using rcu_irq_enter() as you are now. (You don't trace
from NMIs, do you? If you do, I guess I need to somehow disable tracing
during the vulnerable period?)

One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s
trace_rcu_dyntick() statement into all its callers. Another would be to
give rcu_eqs_enter_common() an additional argument that is the amount to
subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the
decrment after the tracing. The latter seems most reasonable at first
glance, especially given that it confines the risky data-inconsistent
period to a single contiguous piece of code.

So how about the following lightly tested patch?

I will check to see if something similar is needed for eqs exit.
Could you please let me know if tracing happens in NMI handlers?
If so, a bit of additional code will be needed.

Thanx, Paul

PS. Which reminds me, any short-term uses of RCU_TASKS? This represents
3 of my 16 test scenarios, which is getting hard to justify for
something that isn't used. Especially given that I will need to
add more scenarios for parallel-callbacks SRCU...

------------------------------------------------------------------------

commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Wed Apr 5 09:05:18 2017 -0700

rcu: Fix dyntick-idle tracing

The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
(with my blessing) to allow the current _rcuidle alternative tracepoint
name to be dispensed with while still maintaining good performance.
Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
appear to RCU like an interrupt that occurs where RCU is not designed
to handle interrupts.

This commit fixes this problem by moving the zeroing of ->dynticks_nesting
after the offending trace_rcu_dyntick() statement, which narrows the
window of vulnerability to a pair of adjacent statements that are now
marked with comments to that effect.

Reported-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50fee7689e71..7308c88ee57a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
}

/*
- * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state
+ * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
*
- * If the new value of the ->dynticks_nesting counter now is zero,
- * we really have entered idle, and must do the appropriate accounting.
- * The caller must have disabled interrupts.
+ * Enter idle, doing appropriate accounting. The caller must have
+ * disabled interrupts.
*/
-static void rcu_eqs_enter_common(long long oldval, bool user)
+static void rcu_eqs_enter_common(bool user)
{
struct rcu_state *rsp;
struct rcu_data *rdp;
- RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
+ struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);

- trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
+ trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
!user && !is_idle_task(current)) {
struct task_struct *idle __maybe_unused =
idle_task(smp_processor_id());

- trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
+ trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
rcu_ftrace_dump(DUMP_ORIG);
WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
current->pid, current->comm,
@@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
do_nocb_deferred_wakeup(rdp);
}
rcu_prepare_for_idle();
- rcu_dynticks_eqs_enter();
+ rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
+ rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
rcu_dynticks_task_enter();

/*
@@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
*/
static void rcu_eqs_enter(bool user)
{
- long long oldval;
struct rcu_dynticks *rdtp;

rdtp = this_cpu_ptr(&rcu_dynticks);
- oldval = rdtp->dynticks_nesting;
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- (oldval & DYNTICK_TASK_NEST_MASK) == 0);
- if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
- rdtp->dynticks_nesting = 0;
- rcu_eqs_enter_common(oldval, user);
- } else {
+ (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
+ if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
+ rcu_eqs_enter_common(user);
+ else
rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
- }
}

/**
@@ -892,19 +888,18 @@ void rcu_user_enter(void)
*/
void rcu_irq_exit(void)
{
- long long oldval;
struct rcu_dynticks *rdtp;

RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
rdtp = this_cpu_ptr(&rcu_dynticks);
- oldval = rdtp->dynticks_nesting;
- rdtp->dynticks_nesting--;
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
- rdtp->dynticks_nesting < 0);
- if (rdtp->dynticks_nesting)
- trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
- else
- rcu_eqs_enter_common(oldval, true);
+ rdtp->dynticks_nesting < 1);
+ if (rdtp->dynticks_nesting <= 1) {
+ rcu_eqs_enter_common(true);
+ } else {
+ trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
+ rdtp->dynticks_nesting--;
+ }
rcu_sysidle_enter(1);
}