Re: [RFC 08/14] sched/tune: add detailed documentation

From: Steve Muckle
Date: Wed Sep 09 2015 - 16:16:21 EST


Hi Patrick,

On 09/03/2015 02:18 AM, Patrick Bellasi wrote:
> In my view, one of the main goals of sched-DVFS is actually that to be
> a solid and generic replacement of different CPUFreq governors.
> Being driven by the scheduler, sched-DVFS can exploit information on
> CPU demand of active tasks in order to select the optimal Operating
> Performance Point (OPP) using a "proactive" approach instead of the
> "reactive" approach commonly used by existing governors.

I'd agree that with knowledge of CPU demand on a per-task basis, rather
than the aggregate per-CPU demand that cpufreq governors use today, it
is possible to proactively address changes in CPU demand which result
from task migrations, task creation and exit, etc.

That said I believe setting the OPP based on a particular given
historical profile of task load still relies on a heuristic algorithm of
some sort where there is no single right answer. I am concerned about
whether sched-dvfs and SchedTune, as currently proposed, will support
enough of a range of possible heuristics/policies to effectively replace
the existing cpufreq governors.

The two most popular governors for normal operation in the mobile world:

* ondemand: Samples periodically, CPU usage calculated as simple busy
fraction of last X ms window of time. Goes straight to fmax when load
exceeds up_threshold tunable %, otherwise scales frequency
proportionally with load. Can stay at fmax longer if requested before
re-evaluating by configuring the sampling_down_factor tunable.

* interactive: Samples periodically, CPU usage calculated as simple busy
fraction of last Xms window of time. Goes to an intermediate tunable
freq (hispeed_freq) when load exceeds a user definable threshold
(go_hispeed_load). Otherwise strives to maintain the CPU usage set by
the user in the "target_loads" array. Other knobs that affect behavior
include min_sample_time (min time to spend at a freq before slowing
down) and above_hispeed_delay (allows various delays to further raise
speed above hispeed freq).

It's also worth noting that mobile vendors typically add all sorts of
hacks on top of the existing cpufreq governors which further complicate
policy.

The current proposal:

* sched-dvfs/schedtune: Event driven, CPU usage calculated using
exponential moving average. AFAICS tries to maintain some % of idle
headroom, but if that headroom doesn't exist at task_tick_fair(), goes
to max frequency. Schedtune provides a way to boost/inflate the demand
of individual tasks or overall system demand.

This looks a bit like ondemand to me but without the
sampling_down_factor functionality and using per-entity load tracking
instead of a simple window-based aggregate CPU usage. The interactive
functionality would require additional knobs. I don't think schedtune
will allow for tuning the latency of CPU frequency changes
(min_sample_time, above_hispeed_delay, etc).

A separate but related concern - in the (IMO likely, given the above)
case that folks want to tinker with that policy, it now means they're
hacking the scheduler as opposed to a self-contained frequency policy
plugin.

Another issue with policy (but not specific to this proposal) is that
putting a bunch of it in the CPU frequency selection may derail the
efforts of the EAS algorithm, which I'm still working on digesting.
Perhaps a unified sched/cpufreq policy could go there.

thanks,
Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/