RE: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17

From: Chatre, Reinette
Date: Wed Mar 23 2016 - 13:15:31 EST


Hi Paul,

On 2016-03-22, Paul E. McKenney wrote:
> On Tue, Mar 22, 2016 at 09:04:47PM +0000, Chatre, Reinette wrote:
>> On 2016-03-22, Paul E. McKenney wrote:
>>> You set CONFIG_RCU_CPU_STALL_TIMEOUT=60, which matches the 60004
>>> jiffies above. Is that value due to a distro setting or something?
>>> Mainline uses CONFIG_RCU_CPU_STALL_TIMEOUT=21.
>>
>> Indeed ... this value originated from a Fedora configuration.
>
> OK. Setting it shorter might (or might not) make it reproduce more
> quickly. This can be set at boot time via rcupdate.rcu_cpu_stall_timeout.
> Or at compile time via CONFIG_RCU_CPU_STALL_TIMEOUT.

I kept the original configuration and seem to be able to reproduce with that.


>>> If dumping manually shortly after the stall is at all non-trivial
>>> (for example, if your reproduction time is many minute or hours),
>>> I can supply some patches that automate this. Or you can pick
>>> them up from -rcu:
>>
>> ... could you please point me to the patches you refer to? Or would you like
> me to try with the entire kernel from rcu/dev?
>
> 2dc92e2a86b9 (rcu: Awaken grace-period kthread if too long since FQS)
> c3fd2095d015 (rcu: Dump ftrace buffer when kicking grace-period kthread)
>
> There might be other dependencies, but these are the two that you need.

I did not look closely at the patches when I applied them and because of that missed that they need a kernel parameter to be activated. After leaving the system idle overnight with these patches the stalls occurred but without the parameter I did not capture the data you need. I will try again tonight. Below are the traces from last night just in case they have value to you.

[10154.635318] INFO: rcu_preempt detected stalls on CPUs/tasks:
[10154.639218] 1-...: (0 ticks this GP) idle=c4e/0/0 softirq=99936/99936 fqs=0
[10154.643497] (detected by 0, t=60005 jiffies, g=24190, c=24189, q=79)
[10154.647596] Task dump for CPU 1:
[10154.650818] swapper/1 R running task 0 0 1 0x00200000
[10154.655052] 00002656bf74de5e ffff8801785cfed0 ffffffff818af34c ffff880100000000
[10154.659349] 0000000600000003 ffff8801785d0000 ffff880072f0bc00 ffffffff822dcf80
[10154.663636] ffff8801785cc000 ffff8801785cc000 ffff8801785cfee0 ffffffff818af597
[10154.667916] Call Trace:
[10154.670845] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[10154.674802] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[10154.678564] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[10154.682295] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[10154.686187] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[10154.690040] rcu_preempt kthread starved for 60005 jiffies! g24190 c24189 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[10154.694944] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[10154.699062] ffff8801785b7d68 ffff88017dc8cc80 ffff8801785c3b80 ffff8801785abb80
[10154.703275] ffff8801785b8000 ffff8801785b7da0 ffff88017dc8cc80 ffff88017dc8cc80
[10154.707481] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 00000001027e21aa
[10154.711692] Call Trace:
[10154.714548] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[10154.718075] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[10154.721832] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[10154.725659] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[10154.729379] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[10154.733235] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[10154.736854] [<ffffffff8112f856>] kthread+0xe6/0x100
[10154.740267] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[10154.743980] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[10154.747511] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11348.912706] INFO: rcu_preempt detected stalls on CPUs/tasks:
[11348.916346] 2-...: (0 ticks this GP) idle=586/0/0 softirq=133504/133504 fqs=0
[11348.920407] (detected by 3, t=60002 jiffies, g=26799, c=26798, q=72)
[11348.924244] Task dump for CPU 2:
[11348.927205] swapper/2 R running task 0 0 1 0x00200000
[11348.931178] 00002adc83427a76 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[11348.935217] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[11348.939237] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[11348.943252] Call Trace:
[11348.945921] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[11348.949615] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[11348.953115] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[11348.956584] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[11348.960215] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[11348.963808] rcu_preempt kthread starved for 60002 jiffies! g26799 c26798 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[11348.968452] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[11348.972309] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[11348.976266] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[11348.980207] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000102c9d45e
[11348.984142] Call Trace:
[11348.986714] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[11348.989974] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[11348.993453] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[11348.997000] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[11349.000431] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[11349.004040] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[11349.007449] [<ffffffff8112f856>] kthread+0xe6/0x100
[11349.010676] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11349.014216] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[11349.017568] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11790.539207] INFO: rcu_preempt detected stalls on CPUs/tasks:
[11790.542652] 2-...: (0 ticks this GP) idle=944/0/0 softirq=133668/133668 fqs=0
[11790.546505] (detected by 3, t=60003 jiffies, g=27589, c=27588, q=72)
[11790.550124] Task dump for CPU 2:
[11790.552871] swapper/2 R running task 0 0 1 0x00200000
[11790.556678] 00002c8422fed65e ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[11790.560580] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[11790.564485] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[11790.568389] Call Trace:
[11790.570942] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[11790.574525] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[11790.577934] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[11790.581311] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[11790.584853] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[11790.588350] rcu_preempt kthread starved for 60003 jiffies! g27589 c27588 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[11790.592945] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[11790.596808] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[11790.600751] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[11790.604688] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000102e5d253
[11790.608635] Call Trace:
[11790.611220] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[11790.614480] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[11790.617971] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[11790.621530] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[11790.624972] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[11790.628558] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[11790.631904] [<ffffffff8112f856>] kthread+0xe6/0x100
[11790.635043] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11790.638485] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[11790.641745] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11820.319166] INFO: rcu_preempt detected stalls on CPUs/tasks:
[11820.322523] 2-...: (0 ticks this GP) idle=9b4/0/0 softirq=133674/133674 fqs=0
[11820.326301] (detected by 3, t=60002 jiffies, g=27620, c=27619, q=64)
[11820.329856] Task dump for CPU 2:
[11820.332536] swapper/2 R running task 0 0 1 0x00200000
[11820.336228] 00002ca4dc6b2bb1 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[11820.339990] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[11820.343749] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[11820.347483] Call Trace:
[11820.349852] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[11820.353250] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[11820.356463] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[11820.359637] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[11820.362973] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[11820.366264] rcu_preempt kthread starved for 60002 jiffies! g27620 c27619 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[11820.370643] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[11820.374276] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[11820.378026] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[11820.381775] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000102e7b58c
[11820.385537] Call Trace:
[11820.387940] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[11820.391005] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[11820.394286] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[11820.397626] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[11820.400853] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[11820.404269] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[11820.407478] [<ffffffff8112f856>] kthread+0xe6/0x100
[11820.410493] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11820.413824] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[11820.416966] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11903.292474] INFO: rcu_preempt detected stalls on CPUs/tasks:
[11903.295736] 2-...: (0 ticks this GP) idle=bd2/0/0 softirq=133700/133700 fqs=0
[11903.299427] (detected by 3, t=60003 jiffies, g=27759, c=27758, q=277)
[11903.302921] Task dump for CPU 2:
[11903.305513] swapper/2 R running task 0 0 1 0x00200000
[11903.309164] 00002cf65a063204 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[11903.312931] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[11903.316685] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[11903.320426] Call Trace:
[11903.322802] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[11903.326199] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[11903.329416] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[11903.332591] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[11903.335931] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[11903.339226] rcu_preempt kthread starved for 60003 jiffies! g27759 c27758 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[11903.343607] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[11903.347246] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[11903.350999] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[11903.354750] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000102ecf7e4
[11903.358508] Call Trace:
[11903.360909] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[11903.363979] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[11903.367259] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[11903.370601] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[11903.373829] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[11903.377243] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[11903.380455] [<ffffffff8112f856>] kthread+0xe6/0x100
[11903.383473] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[11903.386804] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[11903.389942] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12101.322638] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12101.325902] 2-...: (19 GPs behind) idle=880/0/0 softirq=133768/133768 fqs=1
[12101.329542] (detected by 1, t=60004 jiffies, g=28170, c=28169, q=41)
[12101.333009] Task dump for CPU 2:
[12101.335599] swapper/2 R running task 0 0 1 0x00200000
[12101.339248] 00002db4171b61e1 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[12101.343008] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[12101.346763] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[12101.350501] Call Trace:
[12101.352868] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[12101.356262] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[12101.359475] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[12101.362645] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[12101.365980] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[12101.369270] rcu_preempt kthread starved for 60000 jiffies! g28170 c28169 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[12101.373646] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[12101.377281] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[12101.381030] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[12101.384778] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000102f98533
[12101.388532] Call Trace:
[12101.390927] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[12101.393991] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[12101.397266] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[12101.400602] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[12101.403825] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[12101.407240] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[12101.410450] [<ffffffff8112f856>] kthread+0xe6/0x100
[12101.413465] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12101.416793] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[12101.419930] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12269.411058] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12269.414316] 2-...: (27 GPs behind) idle=daa/0/0 softirq=133902/133902 fqs=1
[12269.417947] (detected by 0, t=60003 jiffies, g=28504, c=28503, q=347)
[12269.421426] Task dump for CPU 2:
[12269.424012] swapper/2 R running task 0 0 1 0x00200000
[12269.427656] 00002e58b5f8d033 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[12269.431405] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[12269.435153] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[12269.438886] Call Trace:
[12269.441252] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[12269.444638] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[12269.447850] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[12269.451014] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[12269.454348] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[12269.457634] rcu_preempt kthread starved for 59872 jiffies! g28504 c28503 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[12269.462008] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[12269.465636] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[12269.469380] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[12269.473121] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 0000000103042d26
[12269.476868] Call Trace:
[12269.479262] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[12269.482321] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[12269.485592] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[12269.488926] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[12269.492144] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[12269.495554] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[12269.498756] [<ffffffff8112f856>] kthread+0xe6/0x100
[12269.501764] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12269.505090] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[12269.508224] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12426.587923] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12426.591181] 2-...: (17 GPs behind) idle=15c/0/0 softirq=134033/134034 fqs=1
[12426.594815] (detected by 0, t=60002 jiffies, g=28804, c=28803, q=396)
[12426.598296] Task dump for CPU 2:
[12426.600881] swapper/2 R running task 0 0 1 0x00200000
[12426.604528] 00002eed4414d5d4 ffff8801785d3ed0 ffffffff818af34c ffff880100000000
[12426.608284] 0000000600000003 ffff8801785d4000 ffff880177d01e00 ffffffff822dcf80
[12426.612030] ffff8801785d0000 ffff8801785d0000 ffff8801785d3ee0 ffffffff818af597
[12426.615762] Call Trace:
[12426.618131] [<ffffffff818af34c>] ? cpuidle_enter_state+0xfc/0x310
[12426.621522] [<ffffffff818af597>] ? cpuidle_enter+0x17/0x20
[12426.624733] [<ffffffff811515ba>] ? call_cpuidle+0x2a/0x40
[12426.627903] [<ffffffff8115198d>] ? cpu_startup_entry+0x28d/0x360
[12426.631238] [<ffffffff8108c864>] ? start_secondary+0x114/0x140
[12426.634527] rcu_preempt kthread starved for 59998 jiffies! g28804 c28803 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[12426.638900] rcu_preempt S ffff8801785b7d68 0 7 2 0x00000000
[12426.642535] ffff8801785b7d68 ffff88017dd0cc80 ffff8801785c5940 ffff8801785abb80
[12426.646285] ffff8801785b8000 ffff8801785b7da0 ffff88017dd0cc80 ffff88017dd0cc80
[12426.650031] 0000000000000003 ffff8801785b7d80 ffffffff81ab03df 00000001030e230e
[12426.653787] Call Trace:
[12426.656186] [<ffffffff81ab03df>] schedule+0x3f/0xa0
[12426.659251] [<ffffffff81ab4397>] schedule_timeout+0x127/0x270
[12426.662528] [<ffffffff81171a00>] ? detach_if_pending+0x120/0x120
[12426.665868] [<ffffffff8116d983>] rcu_gp_kthread+0x6d3/0xa40
[12426.669091] [<ffffffff811513a0>] ? wake_atomic_t_function+0x70/0x70
[12426.672502] [<ffffffff8116d2b0>] ? force_qs_rnp+0x1b0/0x1b0
[12426.675710] [<ffffffff8112f856>] kthread+0xe6/0x100
[12426.678724] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190
[12426.682052] [<ffffffff81ab5ccf>] ret_from_fork+0x3f/0x70
[12426.685188] [<ffffffff8112f770>] ? kthread_worker_fn+0x190/0x190