Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

From: Paul E. McKenney
Date: Wed Jul 12 2017 - 14:46:40 EST


On Wed, Jul 12, 2017 at 07:17:56PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2017 at 08:54:58AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 12, 2017 at 02:22:49PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jul 11, 2017 at 11:09:31AM -0700, Paul E. McKenney wrote:
> > > > On Tue, Jul 11, 2017 at 06:34:22PM +0200, Peter Zijlstra wrote:
> > > > > Also, RCU_FAST_NO_HZ will make a fairly large difference here.. Paul
> > > > > what's the state of that thing, do we actually want that or not?
> > > >
> > > > If you are battery powered and don't have tight real-time latency
> > > > constraints, you want it -- it has represent a 30-40% boost in battery
> > > > lifetime for some low-utilization battery-powered devices. Otherwise,
> > > > probably not.
> > >
> > > Would it make sense to hook that off of tick_nohz_idle_enter(); in
> > > specific the part where we actually stop the tick; instead of every
> > > idle?
> >
> > The actions RCU takes on RCU_FAST_NO_HZ depend on the current state of
> > the CPU's callback lists, so it seems to me that the decision has to
> > be made on each idle entry.
> >
> > Now it might be possible to make the checks more efficient, and doing
> > that is on my list.
> >
> > Or am I missing your point?
>
> Could be I'm just not remembering how all that works.. But I was
> wondering if we can do the expensive bits if we've decided to actually
> go NOHZ and avoid doing it on every idle entry.
>
> IIRC the RCU fast NOHZ bits try and flush the callback list (or paw it
> off to another CPU?) such that we can go NOHZ sooner. Having a !empty
> callback list avoid NOHZ from happening.

The code did indeed attempt to flush the callback list back in the day,
but that proved to not actually save any power. There were several
variations in the meantime, but what it does now is to check to see if
there are callbacks at rcu_needs_cpu() time:

1. If there are none, RCU tells the caller that it doesn't need
the CPU.

2. If there are some, and some of them are non-lazy (as in doing
something other than just freeing memory), RCU updates its idea
of which grace period the callbacks are waiting for, otherwise
leaves the callbacks alone, but returns saying that it needs
the CPU around four jiffies (by default), but rounded to allow
one wakeup to handle all CPUs in the power domain. Use the
rcu_idle_gp_delay boot/sysfs parameter to adjust the wait
duration if required. (I haven't heard of adjustment ever
being required.)

Note that a non-lazy callback might well be synchronize_rcu(),
so we cannot wait too long, or we will be delaying things
too much.

3. If there are some callbacks, and all of them are lazy, RCU
again updates its idea of which grace period the callbacks are
waiting for, otherwise leaves the callbacks alone, but returns
saying that it needs the CPU around six seconds (by default)
in the future, but using round_jiffies(), again to share wakeups
within a power domain. Use the rcu_idle_lazy_gp_delay
boot/sysfs parameter to adjust the wait, and again, as far as
I know adjustment never has been necessary.

When the CPU is awakened, it will update its callback based on any
grace periods that have elapsed in the meantime. There is a bit
of work later at rcu_idle_enter() time, but it is quite small.

> Now if we've already decided we can't in fact go NOHZ due to other
> concerns, flushing the callback list is pointless work. So I'm thinking
> we can find a better place to do this.

True, if the tick will still be happening, there is little point
in bothering RCU about it. And if CPUs tend to go idle with RCU
callbacks, then it would be cheaper to check arch_needs_cpu() and
irq_work_needs_cpu() first. If CPUs tend to be free of callbacks
when they go idle, this won't help, and might be counterproductive.

But if rcu_needs_cpu() or rcu_prepare_for_idle() is showing up on
profiles, I could adjust things. This would include making
rcu_prepare_for_idle() no longer expect that rcu_needs_cpu() had
previously been called on the current path to idle. (Not a big
deal, just that the obvious chnage to tick_nohz_stop_sched_tick()
won't necessarily do what you want.)

So please let me know if rcu_needs_cpu() or rcu_prepare_for_idle() are
prominent contributors to to-idle latency.

Thanx, Paul