Re: A few questions and issues with dynticks, NOHZ and powertop

From: Paul E. McKenney
Date: Mon Apr 05 2010 - 17:39:08 EST


On Mon, Apr 05, 2010 at 11:03:40PM +0200, Dominik Brodowski wrote:
> Paul,
>
> I really appreaciate your reply -- thanks! I've done some more testing in
> the meantime:
>
> On Sun, Apr 04, 2010 at 01:47:25PM -0700, Paul E. McKenney wrote:
> > On Sun, Apr 04, 2010 at 06:39:24PM +0200, Dominik Brodowski wrote:
> > > On Sun, Apr 04, 2010 at 11:17:37AM -0400, Alan Stern wrote:
> > > > On Sun, 4 Apr 2010, Dominik Brodowski wrote:
> > > >
> > > > > Booting a SMP-capable kernel with "nosmp", or manually offlining one CPU
> > > > > (or -- though I haven't tested it -- booting a SMP-capable kernel on a
> > > > > system with merely one CPU) means that in up to about half of the calls to
> > > > > tick_nohz_stop_sched_tick() are aborted due to rcu_needs_cpu(). This is
> > > > > quite strange to me: AFAIK, RCU is an excellent tool for SMP, but not really
> > > > > needed for UP?
> > > >
> > > > I can't answer the real question here, not knowing enough about the RCU
> > > > implementation. However, your impression is wrong: RCU very definitely
> > > > _is_ useful and needed on UP systems. It coordinates among processes
> > > > (and interrupt handlers) as well as among processors.
> > >
> > > Okay, but still: can't this be sped up by much on UP (especially if
> > > CONFIG_RCU_FAST_NO_HZ is set), so that we can go to sleep right away?
> >
> > One situation that will prevent CONFIG_RCU_FAST_NO_HZ from putting the
> > machine to sleep right away is if there is an RCU callback posted that
> > spawns another RCU callback, and so on. CONFIG_RCU_FAST_NO_HZ will handle
> > one callback that spawns another, but it gives up if the second callback
> > spawns a third.
>
> Will the remaining callbacks be executed immediately afterwards (due to a
> need_resched() etc.), or only after the next tick?

Only after the next tick. To see why, imagine an RCU callback that
re-registers itself -- which is a perfectly legal thing to do. The
only thing that will happen if we run through grace periods faster is
that we will have more invocations of that same callback to deal with.

So we try for a bit, and if that doesn't get rid of all of the callbacks,
we hold off until the next jiffy.

> > Might this be what is happening to you?
> >
> > If so, would you be willing to patch your kernel? RCU_NEEDS_CPU_FLUSHES
> > is currently set to 5, and might be set to (say) 8. This is defined
> > in kernel/rcutree_plugin.h, near line 990.
>
> Applied the patch by Lai Jiangshan, and tested 5 and 8:
>
> 5: Wakeups-from-idle: 33.4 (hrtimer_sched_timer: 78 %)
> 34% of calls to tick_nohz_stop_sched_tick fail due to
> rcu_needs_cpu()
> 8: Wakeups-from-idle: 36.5 (hrtimer_sched_timer: 83 %)
> 37% of calls to tick_nohz_stop_sched_tick fail due to
> rcu_needs_cpu()

I don't recall your posting wakeups-from-idle for the original -- did
we get improvement? You did say "roughly 50%", but...

OK, I see what is happening...

What happens in the CONFIG_RCU_FAST_NO_HZ case is as follows:

o Check to see if the holdoff period is in effect, and if so,
just check to see if RCU needs the CPU for later processing
without attempting to accelerate grace periods.

o Check to see if there is some other non-dyntick-idle CPU.
If there is, reset holdoff state and just check to see if
RCU needs the CPU for later processing without attempting to
accelerate grace periods.

o Check for initialization and hitting the RCU_NEEDS_CPU_FLUSHES
limit, again doing the "just check" thing if we hit the limit.

o For each of RCU-sched and RCU-bh, note a quiescent state
and force the grace-period machinery, noting in each case
whether or not there are callbacks left to invoke.

o If there are callbacks left to invoke, raise RCU_SOFTIRQ.
This softirq will process the callbacks. (Why not just invoke
the softirq function directly? Because lockdep yells at you
and I do not believe that this is a false positive.)

o If there are callbacks left to invoke, tell the caller that
this CPU cannot yet enter dyntick-idle state.

But if we told the caller that this CPU cannot yet enter dyntick-idle
state, then we also raised RCU_SOFTIRQ. Once the softirq returns, we
should once again try to enter dyntick-idle state.

So a significant fraction of calls to rcu_needs_cpu() saying "no" does
not necessarily mean that we are taking significant time to get the
grace periods and callbacks out of the way. The funny loop involving
softirq is required due to locking-design issues.

Or are you seeing significant delays between successive calls to
rcu_needs_cpu() on your setup?

> > Another thing to try would be running with TINY_RCU, at least if it is
> > OK that RCU be non-preemptible.
>
> tick_nohz_stop_sched_tick() doesn't fail in this case because of
> rcu_needs_cpu(). However, the improvements are hardly recognizable:
>
> TINY_RCU: Wakeups-from-idle: 33.9 (hrtimer_sched_timer: 53 %)

TINY_RCU is set up to automatically do CONFIG_RCU_FAST_NO_HZ, and do
the same softirq dance, or that is the theory, anyway. Again, are you
seeing significant delays between successive calls to rcu_needs_cpu()?

> > And you did mention offlining some CPUs above.
>
> ... just for testing how NOHZ works on UP systems ;)

;-)

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/