Re: [RFC PATCH v2 0/1] cpuidle: teo: Introduce optional util-awareness
From: Rafael J. Wysocki
Date: Fri Oct 28 2022 - 11:08:49 EST
On Fri, Oct 28, 2022 at 5:04 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>
> On Fri, Oct 28, 2022 at 5:01 PM Kajetan Puchalski
> <kajetan.puchalski@xxxxxxx> wrote:
> >
> > On Fri, Oct 28, 2022 at 03:12:43PM +0200, Rafael J. Wysocki wrote:
> >
> > > > The result being that this util-aware TEO variant while using much less
> > > > C1 and decreasing the percentage of too deep sleeps from ~24% to ~3% in
> > > > PCMark Web Browsing also uses almost 2% less power. Clearly the power is
> > > > being wasted on not hitting C1 residency over and over.
> > >
> > > Hmm. The PCMark Web Browsing table in your cover letter doesn't indicate that.
> > >
> > > The "gmean power usage" there for "teo + util-aware" is 205, whereas
> > > for "teo" alone it is 187.8. This is still arguably balanced by the
> > > latency difference (~100 us vs ~185 us, respectively), but this looks
> > > like trading energy for performance.
> >
> > In this case yes, I meant 2% less compared to menu but you're right of
> > course.
> >
> > [...]
> >
> > > Definitely it should not be changed if the previous state is a polling
> > > one which can be checked right away. That would take care of the
> > > "Intel case" automatically.
> >
> > Makes sense, I already used the polling flag to implement this in this other
> > governor I mentioned.
> >
> > >
> > > > Should make it much less intense for Intel systems.
> > >
> > > So I think that this adjustment only makes sense if the current
> > > candidate state is state 1 and state 0 is not polling. In the other
> > > cases the cost of missing an opportunity to save energy would be too
> > > high for the observed performance gain.
> >
> > Interesting, but only applying it to C1 and only when C0 isn't polling would
> > make it effectively not do anything on Intel systems, right?
>
> Indeed.
>
> > From what I've seen on Doug's plots even C1 is hardly ever used on his platform, most
> > sleeps end up in the deepest possible state.
>
> That depends a lot on the workload. There are workloads in which C1
> is mostly used and the deeper idle states aren't.
>
> > Checking for the polling flag is a good idea regardless so I can send a
> > v3 with that. If you'd like me to also restrict the entire mechanism to
> > only working on C1 as you suggested then I'm okay with including that in
> > the v3 as well. What do you think?
>
> It would be good to do that and see if there are any significant
> differences in the results.
BTW, you may as well drop the extra #ifdeffery from the v3, I don't
think that it is particularly useful.