Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods
From: Paul E. McKenney
Date: Thu Jul 13 2017 - 11:20:43 EST
On Thu, Jul 13, 2017 at 04:53:11PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 13, 2017 at 10:48:55PM +0800, Li, Aubrey wrote:
>
> > - totally from arch_cpu_idle_enter entry to arch_cpu_idle_exit return costs
> > 9122ns - 15318ns.
> > ---- In this period(arch idle), rcu_idle_enter costs 1985ns - 2262ns, rcu_idle_exit
> > costs 1813ns - 3507ns
> >
> > Besides RCU,
>
> So Paul wants more details on where RCU hurts so we can try to fix.
More specifically: rcu_needs_cpu(), rcu_prepare_for_idle(),
rcu_cleanup_after_idle(), rcu_eqs_enter(), rcu_eqs_enter_common(),
rcu_dynticks_eqs_enter(), do_nocb_deferred_wakeup(),
rcu_dynticks_task_enter(), rcu_eqs_exit(), rcu_eqs_exit_common(),
rcu_dynticks_task_exit(), rcu_dynticks_eqs_exit().
The first three (rcu_needs_cpu(), rcu_prepare_for_idle(), and
rcu_cleanup_after_idle()) should not be significant unless you have
CONFIG_RCU_FAST_NO_HZ=y. If you do, it would be interesting to learn
how often invoke_rcu_core() is invoked from rcu_prepare_for_idle()
and rcu_cleanup_after_idle(), as this can raise softirq. Also
rcu_accelerate_cbs() and rcu_try_advance_all_cbs().
Knowing which of these is causing the most trouble might help me
reduce the overhead in the current idle path.
Also, how big is this system? If you can say, about what is the cost
of a cache miss to some other CPU's cache?
Thanx, Paul
> > the period includes c-state selection on X86, a few timestamp updates
> > and a few computations in menu governor. Also, deep HW-cstate latency can be up
> > to 100+ microseconds, even if the system is very busy, CPU still has chance to enter
> > deep cstate, which I guess some outburst workloads are not happy with it.
> >
> > That's my major concern without a fast idle path.
>
> Fixing C-state selection by creating an alternative idle path sounds so
> very wrong.
>