Re: [RFC/RFT][PATCH 4/7] cpuidle: menu: Split idle duration prediction from state selection

From: Peter Zijlstra
Date: Tue Mar 06 2018 - 03:46:10 EST


On Tue, Mar 06, 2018 at 10:15:10AM +0800, Li, Aubrey wrote:
> On 2018/3/5 21:53, Peter Zijlstra wrote:
> > On Mon, Mar 05, 2018 at 02:05:10PM +0100, Rafael J. Wysocki wrote:
> >> On Mon, Mar 5, 2018 at 1:50 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>> On Mon, Mar 05, 2018 at 12:47:23PM +0100, Rafael J. Wysocki wrote:
> >
> >>>> IOW, the target residency of the selected state doesn't tell you how
> >>>> much time you should expect to be idle in general.
> >>>
> >>> Right, but I think that measure isn't of primary relevance. What we want
> >>> to know is: 'should I stop the tick' and 'what C state do I go to'.
>
> I understood the benefit of mapping duration to state number, is duration <->
> state number mapping a generic solution to all arches?

Yes, all platforms have a limited set of possible idle states.

> Back to the user's concern is, "I'm running a latency sensitive application, and
> I want idle switching ASAP". So I think the user may not care about what C state
> to go into, that is, even if a deeper state has chance to go, the user striving
> for a higher workload score may still not want it?

The user caring about performance very much cares about the actual idle
state too, exit latency for deeper states is horrific and will screw
them up just as much as the whole nohz timer reprogramming does.

We can basically view the whole nohz thing as an additional entry/exit
latency for the idle state, which is why I don't think its weird to
couple them.

> >> Maybe just return a "nohz" indicator from cpuidle_select() in addition
> >> to the state index and make the decision in the governor?
> >
> > Much better option than returning a duration :-)
> >
> So what does "nohz = disable and state index = deepest" mean? This combination
> does not make sense for performance only purpose?

I tend to agree with you that the state space allowed by a separate
variable is larger than required, but it's significantly smaller than
preserving 'time' so I can live with it.