Re: rcu_sched stalls in idle task introduced in pre-4.8?

From: Rich Felker
Date: Tue Aug 02 2016 - 15:24:16 EST


On Tue, Aug 02, 2016 at 11:16:36AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 02, 2016 at 01:04:15PM -0400, Rich Felker wrote:
> > Hi Paul,
> >
> > As I mentioned on Twitter, I'm experiencing (otherwise benign) rcu
> > stall messages since rebasing my tree on Linus's in-progress merge for
> > 4.8, as part of updating both the arch/sh tree and drivers going
> > upstream through other maintainers to account for changes made to
> > infrastructure they depend on. The revisions I've tried rebasing on
> > are 194dc870a5890e855ecffb30f3b80ba7c88f96d6 and
> > 7f155c702677d057d03b192ce652311de5434697 (now using the latter).
>
> Very helpful, thank you!
>
> Oh, and you can boot with rcupdate.rcu_cpu_stall_suppress to prevent
> RCU CPU stall warnings from appearing, though that of course won't
> help with whatever the underlying problem might be.

Indeed, but I have found them quite useful in the past for tracking
down actual deadlocks, so it's nice to keep them enabled.

> > The system I'm testing on is an SMP configuration of the J2 SoC on
> > FPGA (see: j-core.org), and the SMP support and timer driver (most
> > relevant, imo) are new, but never exhibited this problem on 4.6 or
> > 4.7. A tree with all drivers merged is available in the development
> > branch of git://git.libc.org/linux-sh; it's regularly rebased but
> > there's a perma-tag devel-20160801 you can check out instead. Fetching
> > into an existing clone of Linus's tree is recommended since the server
> > has limited resources.
> >
> > I doubt the code from my devel branch will be useful without hardware
> > to test on, though. Hopefully the issue is reproducible on other
> > systems. Very low cpu speed (50 MHz) could be a factor. The problem
> > seems to go away with dyntick disabled.
>
> Indeed, it is quite possible that RCU's self-tuning needs a bit of help
> for CPUs below a few hundred MHz. As might other pieces of the kernel.
> Should not be a big deal (famous last words...). However, given that
> you are running HZ=100, it should be reasonably close. So first, let's
> see what is going on here...
>
> The following dmesg line usually indicates that RCU's kernel thread is
> stuck for some reason.
>
> [42039.188293] rcu_sched kthread starved for 2152 jiffies! g70249 c70248 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
>
> The RCU_GP_WAIT_FQS indicates that it is waiting for a timer to expire.
> The task is in TASK_INTERRUPTIBLE state, but it should have been awakened
> after only a few jiffies sleep time. The stack trace looks like there
> might be trouble canceling the timer.

Can you offer any advice on tracking down what it's waiting for, or
what's going on with the timer? Is any arch- or clockevent-specific
stuff that might be my fault involved here?

> Is it possible that the timer
> handler is being prevented from making progress due to heavy interrupt
> load or some such?

Actually the stall messages are only appearing when the system is idle
and interrupts are infrequent. That's what leads me to believe it
might be a situation where, on every (infrequent) wake, both the rcu
task and something blocking its progress happen at roughly the same
time, and the slowness of the cpu leads the rcu task to decide to wait
for the next timer/schedule event and try again -- over and over. Of
course this hypothesis is just based on a very high-level
understanding of what's going on; I'm not sufficiently familiar wilth
the implementation specifics to offer more.

One observation I just made versus old kernels where this was not
happening: I've got a lot more timer interrupts happening than before.
In 6 minutes of uptime:

CPU0 CPU1
17: 38 0 AIC 17 abce0000.ethernet
18: 811 0 AIC 18 uartlite
20: 188 121 AIC 20 ipi
72: 11295 6499 AIC 72 jcore_pit

That's 30+ timer interrupts per sec on cpu0. Previously it was more
like 1-5 per second when idle (with dyntick).

> > My .config and message log are below. Let me know if there are any
> > other details I can provide that would be helpful.
>
> In case the timer issue I am suspecting is a red herring...
>
> RCU does tune itself for different values of HZ, but it might need more
> help for 50MHz CPUs. (Although if this were the problem, I would expect
> you to be seeing excessive CPU consumption by the rcu_sched kthread.)
> But it might be worthwhile booting with rcutree.jiffies_till_sched_qs=10
> and rcutree.jiffies_till_first_fqs=10 (or thereabouts). It might also
> be necessary to use a longer RCU CPU stall warning timeout (assuming
> you don't just suppress it as above), for example, by booting with (say)
> rcupdate.rcu_cpu_stall_timeout=60.

With that in mind, it seems plausible that the problem existed before
and just went over the threshold for reporting the stall with a
non-bug change going into 4.8. Would building an older known-good
kernel with TREE_RCU_TRACE enabled and checking debugfs
rcu/rcu_sched/rcugp for max=??? be informative?

> Is there any way for RCU to determine at boot time what the nominal CPU
> speed is? I suppose I could look at BogoMIPs, but is there something
> better?

I'm not aware of any clean cross-arch way to do it.

Rich