Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
From: Paul E. McKenney
Date: Tue Mar 22 2016 - 13:40:23 EST
On Tue, Mar 22, 2016 at 04:35:32PM +0000, Chatre, Reinette wrote:
> Hi Paul,
Hello, Reinette!
> On 2016-03-21, Paul E. McKenney wrote:
> > On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote:
> >> On Fri, 18 Mar 2016 16:56:41 -0700
> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >>> On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote:
> >>>> On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote:
> >
> > [ . . . ]
> >
> >>>> We're seeing a similar stall (~60 seconds) on an x86 development
> >>>> system here. Any luck tracking down the cause of this? If not, any
> >>>> suggestions for traces that might be helpful?
> >>>
> >>> The dmesg containing the stall, the kernel version, and the .config
> >>> would be helpful! Working on a torture test specific to this bug...
And thank you for the .config. Your kenrle version looks to be 4.5.0.
> >> +Reinette, she has the system that can reproduce the issue. I
> >> believe she is having some other problems with it at the moment. But
> >> the .config should be available. Version is v4.5.
> >
> > A couple of additional questions:
> >
> > 1. Is the test running on bare metal or virtualized? If the
> > latter, what is the host?
>
> Bare metal.
OK, you are ahead of me. Mine is virtualized.
> > 2. Does the workload involve CPU hotplug?
>
> No.
Again, you are ahead of me. Mine makes extremely heavy use of CPU hotplug.
> > 3. Are you seeing things like this in dmesg?
> >
> > "rcu_preempt kthread starved for 21033 jiffies"
> > "rcu_sched kthread starved for 32103 jiffies"
> > "rcu_bh kthread starved for 84031 jiffies"
> >
> > If not, you are probably facing some other bug, and should
> > proceed debugging as described in Documentation/RCU/stallwarn.txt.
>
> Below is a sample of what I see as captured with v4.5. The kernel configuration is attached.
>
> [ 135.456197] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 135.457729] 3-...: (0 ticks this GP) idle=722/0/0 softirq=5532/5532 fqs=0
> [ 135.459604] (detected by 2, t=60004 jiffies, g=2105, c=2104, q=165)
> [ 135.461318] Task dump for CPU 3:
> [ 135.461321] swapper/3 R running task 0 0 1 0x00200000
> [ 135.461325] 00000078560040e5 ffff88017846fed0 ffffffff818af2cc ffff880100000000
> [ 135.461330] 0000000600000003 ffff880178470000 ffff880072f32200 ffffffff822dcec0
> [ 135.461334] ffff88017846c000 ffff88017846c000 ffff88017846fee0 ffffffff818af517
> [ 135.461338] Call Trace:
> [ 135.461345] [<ffffffff818af2cc>] ? cpuidle_enter_state+0xfc/0x310
> [ 135.461349] [<ffffffff818af517>] ? cpuidle_enter+0x17/0x20
> [ 135.461353] [<ffffffff811515aa>] ? call_cpuidle+0x2a/0x40
> [ 135.461355] [<ffffffff8115197d>] ? cpu_startup_entry+0x28d/0x360
> [ 135.461360] [<ffffffff8108c874>] ? start_secondary+0x114/0x140
> [ 135.461365] rcu_preempt kthread starved for 60004 jiffies! g2105 c2104 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
And yes, it looks like you are seeing the same bug that I am tracing.
The kthread is blocked on a schedule_timeout_interruptible(). Given
default configuration, this would have a three-jiffy timeout.
You set CONFIG_RCU_CPU_STALL_TIMEOUT=60, which matches the 60004 jiffies
above. Is that value due to a distro setting or something? Mainline
uses CONFIG_RCU_CPU_STALL_TIMEOUT=21.
> [ 135.463965] rcu_preempt S ffff88017844fd68 0 7 2 0x00000000
> [ 135.463969] ffff88017844fd68 ffff88017dd8cc80 ffff880177ff0000 ffff880178443b80
> [ 135.463973] ffff880178450000 ffff88017844fda0 ffff88017dd8cc80 ffff88017dd8cc80
> [ 135.463977] 0000000000000003 ffff88017844fd80 ffffffff81ab031f 0000000100031504
> [ 135.463981] Call Trace:
> [ 135.463986] [<ffffffff81ab031f>] schedule+0x3f/0xa0
> [ 135.463989] [<ffffffff81ab42d7>] schedule_timeout+0x127/0x270
> [ 135.463993] [<ffffffff81171a50>] ? detach_if_pending+0x120/0x120
> [ 135.463997] [<ffffffff8116da5d>] rcu_gp_kthread+0x6bd/0xa30
> [ 135.464000] [<ffffffff81151390>] ? wake_atomic_t_function+0x70/0x70
> [ 135.464003] [<ffffffff8116d3a0>] ? force_qs_rnp+0x1b0/0x1b0
> [ 135.464006] [<ffffffff8112f846>] kthread+0xe6/0x100
> [ 135.464009] [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190
> [ 135.464012] [<ffffffff81ab5c0f>] ret_from_fork+0x3f/0x70
> [ 135.464015] [<ffffffff8112f760>] ? kthread_worker_fn+0x190/0x190
How long does it take to reproduce this? If it reproduces in minutes
or hours, could you please boot with the following on the kernel command
line and dump the trace buffer shortly after the stall?
ftrace trace_event=sched_waking,sched_wakeup,sched_wake_idle_without_ipi
If dumping manually shortly after the stall is at all non-trivial
(for example, if your reproduction time is many minute or hours),
I can supply some patches that automate this. Or you can pick
them up from -rcu:
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
Branch rcu/dev has these patches (and much else besides).
Thanx, Paul
PS: In case you are curious, when I enable those tracepoints, it
shows me that the timer is firing every three jiffies, as it
should, but that something happens between the sched_waking
and the IPI handler that should actually do the wakeup.
However, adding the traces significantly slows reproduction,
so I am writing a stress test specific to this bug to try to
speed things up, hopefully allowing more tracing to be added
while still retaining non-geologic reproduction times.