Re: RCU related tracebacks in various boot tests in -next
From: Paul E. McKenney
Date: Fri Nov 11 2016 - 13:29:06 EST
On Fri, Nov 11, 2016 at 06:32:43AM -0800, Guenter Roeck wrote:
> Hi,
>
> I see a number of RCU related tracebacks in recent versions of -next.
> The problem was first seen in next-20161109.
>
> I didn't bisect, but my wild guess is to blame Paul's related changes.
> Please let me know if a bisect is needed.
>
> Detailed logs are at http://kerneltests.org/builders.
> Build/test scripts are in the various subdirectories of
> https://github.com/groeck/linux-build-test/tree/master/rootfs/.
>
> Thanks,
> Guenter
>
> --------
This is a failure on my part to convert fully to the new scheme, will fix.
(The third trace makes the situation clear, this one at first glance
looks like a failure to invoke irq_enter(), but it is most likely another
symptom of the problem in the third trace.)
> arm:
>
> [ 1.699096] ===============================
> [ 1.699188] suspicious RCU usage. ]
> [ 1.699340] 4.9.0-rc4-next-20161111 #4 Not tainted
> [ 1.699462] -------------------------------
> [ 1.699554] ./include/trace/events/irq.h:69 suspicious rcu_dereference_check() usage!
> [ 1.699707]
> [ 1.699707] other info that might help us debug this:
> [ 1.699707]
> [ 1.699890]
> [ 1.699890] RCU used illegally from idle CPU!
> [ 1.699890] rcu_scheduler_active = 1, debug_locks = 0
> [ 1.700103] RCU used illegally from extended quiescent state!
> [ 1.700256] no locks held by swapper/0/0.
> [ 1.700317]
> [ 1.700317] stack backtrace:
> [ 1.700500] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #4
> [ 1.700653] Hardware name: Generic OMAP3 (Flattened Device Tree)
> [ 1.700988] [<c01100ac>] (unwind_backtrace) from [<c010c290>] (show_stack+0x10/0x14)
> [ 1.701171] [<c010c290>] (show_stack) from [<c048f688>] (dump_stack+0xb0/0xe4)
> [ 1.701354] [<c048f688>] (dump_stack) from [<c01a319c>] (__handle_irq_event_percpu+0x2fc/0x3d0)
> [ 1.701538] [<c01a319c>] (__handle_irq_event_percpu) from [<c01a328c>] (handle_irq_event_percpu+0x1c/0x58)
> [ 1.701751] [<c01a328c>] (handle_irq_event_percpu) from [<c01a3300>] (handle_irq_event+0x38/0x5c)
> [ 1.701934] [<c01a3300>] (handle_irq_event) from [<c01a65d0>] (handle_level_irq+0xb8/0x14c)
> [ 1.702087] [<c01a65d0>] (handle_level_irq) from [<c01a2208>] (generic_handle_irq+0x20/0x34)
> [ 1.702270] [<c01a2208>] (generic_handle_irq) from [<c01a2748>] (__handle_domain_irq+0x64/0xe0)
> [ 1.702453] [<c01a2748>] (__handle_domain_irq) from [<c07dc7b0>] (__irq_svc+0x70/0x98)
> [ 1.702606] [<c07dc7b0>] (__irq_svc) from [<c0108360>] (arch_cpu_idle+0x20/0x3c)
> [ 1.702789] [<c0108360>] (arch_cpu_idle) from [<c01882ec>] (cpu_startup_entry+0x180/0x224)
> [ 1.702941] [<c01882ec>] (cpu_startup_entry) from [<c0b00c10>] (start_kernel+0x344/0x3b8)
> [ 1.703186] [<c0b00c10>] (start_kernel) from [<8000807c>] (0x8000807c)
>
> arm64:
For this second one....
> [ 1.247359] ------------[ cut here ]------------
> [ 1.247761] WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:380 rcu_momentary_dyntick_idle+0xc0/0xc8
> [ 1.247884] Modules linked in:
> [ 1.247987]
> [ 1.248150] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc4-next-20161111 #1
> [ 1.248241] Hardware name: ZynqMP EP108 (DT)
> [ 1.248350] task: ffff80007c4d1900 task.stack: ffff80007c4e0000
> [ 1.248445] PC is at rcu_momentary_dyntick_idle+0xc0/0xc8
> [ 1.248523] LR is at rcu_note_context_switch+0x1b4/0x2c0
> [ 1.248595] pc : [<ffff000008114df0>] lr : [<ffff000008119a0c>] pstate: 600000c5
> [ 1.248676] sp : ffff80007c4e3ea0
> [ 1.248738] x29: ffff80007c4e3ea0 x28: ffff000008d88000
> [ 1.248842] x27: ffff80007c4e0000 x26: ffff000008da4803
> [ 1.248919] x25: 0000000000000000 x24: ffff80007c4e0000
> [ 1.248992] x23: ffff0000088b3544 x22: ffff80007c4d1900
> [ 1.249065] x21: ffff80007efb1880 x20: ffff000008d88000
> [ 1.249138] x19: ffff80007c4d1900 x18: 000000000000000e
> [ 1.249211] x17: 0000000000000007 x16: 0000000000000001
> [ 1.249285] x15: 000000000000004c x14: 0000000000000000
> [ 1.249362] x13: 000000000000000b x12: 0000000000000006
> [ 1.249435] x11: ffff0000088e0ad8 x10: ffff80007efb18b0
> [ 1.249586] x9 : ffff0000088e0ad4 x8 : 00000000029f6300
> [ 1.249715] x7 : 7fffffffffffffff x6 : ffff80007efabbd0
> [ 1.249839] x5 : 0000000000000004 x4 : ffff000008da0688
> [ 1.249963] x3 : 0000000000000004 x2 : ffff80007efabbdc
> [ 1.250088] x1 : 0000000000000001 x0 : ffff000008da4811
> [ 1.250213]
> [ 1.250325] ---[ end trace 2930a57464ba65da ]---
> [ 1.250551] Call trace:
> [ 1.251134] Exception stack(0xffff80007c4e3cd0 to 0xffff80007c4e3e00)
> [ 1.251396] 3cc0: ffff80007c4d1900 0001000000000000
> [ 1.251605] 3ce0: ffff80007c4e3ea0 ffff000008114df0 ffff000008e6c388 ffff80007c4e3f80
> [ 1.251790] 3d00: ffff80007c4e3d90 ffff00000838b804 ffff000008e6c3ac ffff80007c4e3fd0
> [ 1.251972] 3d20: 00000000ffffffd8 ffff000008910b8c 00000000000003e0 ffff000008b4c1c1
> [ 1.252151] 3d40: ffff000008e6c388 ffff80007c4e3fd0 ffff000008b4c1bd ffff000008107264
> [ 1.252331] 3d60: ffff80007c4e3d90 ffff00000838b4e0 ffff000008da4811 0000000000000001
> [ 1.252509] 3d80: ffff80007efabbdc 0000000000000004 ffff000008da0688 0000000000000004
> [ 1.252688] 3da0: ffff80007efabbd0 7fffffffffffffff 00000000029f6300 ffff0000088e0ad4
> [ 1.252866] 3dc0: ffff80007efb18b0 ffff0000088e0ad8 0000000000000006 000000000000000b
> [ 1.253043] 3de0: 0000000000000000 000000000000004c 0000000000000001 0000000000000007
> [ 1.253269] [<ffff000008114df0>] rcu_momentary_dyntick_idle+0xc0/0xc8
> [ 1.253442] [<ffff0000088b3000>] __schedule+0x68/0x570
> [ 1.253598] [<ffff0000088b3544>] schedule+0x3c/0xa8
> [ 1.253726] [<ffff0000088b3998>] schedule_preempt_disabled+0x20/0x38
> [ 1.253884] [<ffff0000080f893c>] cpu_startup_entry+0xd4/0x1d8
> [ 1.254026] [<ffff00000808e820>] secondary_start_kernel+0x158/0x198
> [ 1.254268] [<00000000008bd1a4>] 0x8bd1a4
Looks like I got too excited about adding WARN_ON_ONCE(), will fix.
> s390:
>
> ===============================
>
> suspicious RCU usage. ]
> 4.9.0-rc4-next-20161111 #1 Not tainted
> -------------------------------
> ./include/trace/events/rcu.h:454 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/0/0.
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc4-next-20161111 #1
> Hardware name: QEMU QEMU QEMU (KVM)
> Stack:
> 000000001fffbcb0 000000001fffbd40 0000000000000003 0000000000000000
> 000000001fffbde0 000000001fffbd58 000000001fffbd58 0000000000000020
> 0000000001e20b70 ffffffffffffffff 0000000000000000 0404c0010000000a
> 000000000000000c 000000001fffbda8 0000000000000000 0000000000000000
> 0000000000000000 00000000001168f4 000000001fffbd40 000000001fffbd88
>
> Call Trace:
> ([<00000000001167b0>] show_trace+0x100/0x104)
> [<0000000000116896>] show_stack+0x6e/0xec
> [<00000000007985e0>] dump_stack+0x94/0xd4
> [<000000000021882e>] rcu_eqs_exit_common.isra.49+0x1ee/0x21c
> [<000000000022172a>] rcu_irq_enter+0x1ca/0x284
> [<0000000000182178>] irq_enter+0x40/0xc0
> [<000000000010dcca>] do_IRQ+0x5a/0x94 [<0000000000ad234e>] ext_int_handler+0x11e/0x124
> [<0000000000103c50>] enabled_wait+0x64/0x110
> ([<0000000000103c3c>] enabled_wait+0x50/0x110)
> [<0000000000103eca>] arch_cpu_idle+0x42/0x54
> [<0000000000ad0576>] default_idle_call+0x4e/0x64
> [<00000000001ddcb0>] cpuidle_idle_call+0x54/0xa4
> [<00000000001dde62>] cpu_startup_entry+0xea/0x12c
> [<0000000000eaec82>] start_kernel+0x3ee/0x3fc
> [<0000000000100020>] _stext+0x20/0x80
And here I failed to convert one of the accesses to the ->dynticks
counter to the new scheme. Again, will fix, patch below for the
-next you were using.
And thank you for your testing efforts!
Thanx, Paul
------------------------------------------------------------------------
commit ee17957e96a196d3ea70a4b1c20ca09f8394ce66
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Fri Nov 11 10:24:56 2016 -0800
fixup! rcu: Abstract the dynticks snapshot operation
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c1e3d4a333b2..03f7bcbc2e99 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1139,7 +1139,9 @@ void rcu_nmi_exit(void)
*/
bool notrace __rcu_is_watching(void)
{
- return atomic_read(this_cpu_ptr(&rcu_dynticks.dynticks)) & 0x1;
+ struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
+
+ return rcu_dynticks_in_eqs(rcu_dynticks_snap(rdtp));
}
/**