Re: [RFC PATCH v2 0/5] sched/cpufreq: Make schedutil energy aware

From: Douglas Raillard
Date: Mon Jul 08 2019 - 09:50:03 EST




On 7/8/19 12:13 PM, Patrick Bellasi wrote:
On 03-Jul 14:38, Douglas Raillard wrote:
Hi Peter,

On 7/2/19 4:44 PM, Peter Zijlstra wrote:
On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote:
Make schedutil cpufreq governor energy-aware.

- patch 1 introduces a function to retrieve a frequency given a base
frequency and an energy cost margin.
- patch 2 links Energy Model perf_domain to sugov_policy.
- patch 3 updates get_next_freq() to make use of the Energy Model.


1) Selecting the highest possible frequency for a given cost. Some
platforms can have lower frequencies that are less efficient than
higher ones, in which case they should be skipped for most purposes.
They can still be useful to give more freedom to thermal throttling
mechanisms, but not under normal circumstances.
note: the EM framework will warn about such OPPs "hertz/watts ratio
non-monotonically decreasing"

Humm, for some reason I was thinking we explicitly skipped those OPPs
and they already weren't used.

This isn't in fact so, and these first few patches make it so?

That's correct, the cost information about each OPP has been introduced recently in mainline
by the energy model series. Without that info, the only way to skip them that comes to my
mind is to set a policy min frequency, since these inefficient OPPs are usually located
at the lower end.

Perhaps it's also worth to point out that the alternative approach you
point out above is a system wide solution.

While, the ramp_boost thingy you propose, it's a more fine grained
mechanisms which could be extended in the future to have a per-task
side. IOW, it could contribute to have better user-space hints, for
example to ramp_boost more certain tasks and not others.

ramp_boost and the situation you describe are more what solves point 2) (which has been cut out in that answer),
this point "1)" is really just about avoiding selection of some OPPs, regardless of task util. IOW, it's better to
skip the OPPs we talk about here, and race to idle at a higher OPP regardless of what the task need.


Best,
Patrick


Cheers,
Douglas