Re: [RFC/RFT][PATCH v4 1/2] cpufreq: New governor using utilization data from the scheduler
From: Rafael J. Wysocki
Date: Fri Feb 26 2016 - 19:20:17 EST
On Thursday, February 25, 2016 10:14:35 PM Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>
> Add a new cpufreq scaling governor, called "schedutil", that uses
> scheduler-provided CPU utilization information as input for making
> its decisions.
>
> Doing that is possible after commit fe7034338ba0 (cpufreq: Add
> mechanism for registering utilization update callbacks) that
> introduced cpufreq_update_util() called by the scheduler on
> utilization changes (from CFS) and RT/DL task status updates.
> In particular, CPU frequency scaling decisions may be based on
> the the utilization data passed to cpufreq_update_util() by CFS.
>
> The new governor is very simple. It is almost as simple as it
> can be and remain reasonably functional.
>
> The frequency selection formula used by it is essentially the same
> as the one used by the "ondemand" governor, although it doesn't use
> the additional up_threshold parameter, but instead of computing the
> load as the "non-idle CPU time" to "total CPU time" ratio, it takes
> the utilization data provided by CFS as input. More specifically,
> it represents "load" as the util/max ratio, where util and max
> are the utilization and CPU capacity coming from CFS.
>
> All of the computations are carried out in the utilization update
> handlers provided by the new governor. One of those handlers is
> used for cpufreq policies shared between multiple CPUs and the other
> one is for policies with one CPU only (and therefore it doesn't need
> to use any extra synchronization means). The only operation carried
> out by the new governor's ->gov_dbs_timer callback, sugov_set_freq(),
> is a __cpufreq_driver_target() call to trigger a frequency update (to
> a value already computed beforehand in one of the utilization update
> handlers). This means that, at least for some cpufreq drivers that
> can update CPU frequency by doing simple register writes, it should
> be possible to set the frequency in the utilization update handlers
> too in which case all of the governor's activity would take place in
> the scheduler paths invoking cpufreq_update_util() without the need
> to run anything in process context.
>
> Currently, the governor treats all of the RT and DL tasks as
> "unknown utilization" and sets the frequency to the allowed
> maximum when updated from the RT or DL sched classes. That
> heavy-handed approach should be replaced with something more
> specifically targeted at RT and DL tasks.
>
> To some extent it relies on the common governor code in
> cpufreq_governor.c and it uses that code in a somewhat unusual
> way (different from what the "ondemand" and "conservative"
> governors do), so some small and rather unintrusive changes
> have to be made in that code and the other governors to support it.
>
> However, after making it possible to set the CPU frequency from
> the utilization update handlers, that new governor's interactions
> with the common code might be limited to the initialization, cleanup
> and handling of sysfs attributes (currently only one attribute,
> sampling_rate, is supported in addition to the standard policy
> attributes handled by the cpufreq core).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> ---
>
> There was no v3 of this patch, but patch [2/2] had a v3 in the meantime.
>
> Changes from v2:
> - Avoid requesting the same frequency that was requested last time for
> the given policy.
>
> Changes from v1:
> - Use policy->min and policy->max as min/max frequency in computations.
>
Well, this is getting interesting. :-)
Some preliminary results from SpecPower indicate that this governor (with
patch [2/2] on top), as simple as it is, beats "ondemand" in performance/power
and generally allows the system to achieve better performance. The difference
is not significant, but measurable.
If that is confirmed in further testing, I will be inclined to drop this thing
in in the next cycle.
Thanks,
Rafael