Re: [PATCH RFC v4] cpufreq: schedutil: Make iowait boost more energy efficient

From: Juri Lelli
Date: Mon Jul 10 2017 - 06:55:09 EST


Hi Joel,

On 09/07/17 10:08, Joel Fernandes wrote:
> Currently the iowait_boost feature in schedutil makes the frequency go to max.
> This feature was added to handle a case that Peter described where the
> throughput of operations involving continuous I/O requests [1] is reduced due
> to running at a lower frequency, however the lower throughput itself causes
> utilization to be low and hence causing frequency to be low hence its "stuck".
>
> Instead of going to max, its also possible to achieve the same effect by
> ramping up to max if there are repeated in_iowait wakeups happening. This patch
> is an attempt to do that. We start from a lower frequency (iowait_boost_min)
> and double the boost for every consecutive iowait update until we reach the
> maximum iowait boost frequency (iowait_boost_max).
>
> I ran a synthetic test on an x86 machine with intel_pstate in passive mode
> using schedutil. The patch achieves the desired effect as the existing
> behavior. Also tested on ARM64 platform and see that there the transient iowait
> requests aren't causing frequency spikes.
>
> [1] https://patchwork.kernel.org/patch/9735885/
>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> Cc: Len Brown <lenb@xxxxxxxxxx>
> Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> Cc: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Suggested-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
> ---
> Changes since v1:
> - not using tunables to plainly turn off iowait boost anymore
>
> kernel/sched/cpufreq_schedutil.c | 52 ++++++++++++++++++++++++++++++++++------
> 1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 622eed1b7658..4d9e8b96bed1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,7 +53,9 @@ struct sugov_cpu {
> struct update_util_data update_util;
> struct sugov_policy *sg_policy;
>
> + bool prev_iowait_boost;
> unsigned long iowait_boost;
> + unsigned long iowait_boost_min;
> unsigned long iowait_boost_max;
> u64 last_update;
>
> @@ -168,22 +170,47 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
> *max = cfs_max;
> }
>
> +static void sugov_decay_iowait_boost(struct sugov_cpu *sg_cpu)
> +{
> + sg_cpu->iowait_boost >>= 1;
> +
> + if (sg_cpu->iowait_boost < sg_cpu->iowait_boost_min)
> + sg_cpu->iowait_boost = 0;
> +}
> +
> static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> unsigned int flags)
> {
> if (flags & SCHED_CPUFREQ_IOWAIT) {
> - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
> + /* Remember for next time that we did an iowait boost */
> + sg_cpu->prev_iowait_boost = true;
> + if (sg_cpu->iowait_boost) {
> + sg_cpu->iowait_boost <<= 1;
> + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost,
> + sg_cpu->iowait_boost_max);
> + } else {
> + sg_cpu->iowait_boost = sg_cpu->iowait_boost_min;
> + }
> } else if (sg_cpu->iowait_boost) {
> s64 delta_ns = time - sg_cpu->last_update;
>
> /* Clear iowait_boost if the CPU apprears to have been idle. */
> if (delta_ns > TICK_NSEC)
> sg_cpu->iowait_boost = 0;

In this case we might just also want to reset prev_iowait_boost
unconditionally and return.

> +
> + /*
> + * Since we don't decay iowait_boost when its consumed during
> + * the previous SCHED_CPUFREQ_IOWAIT update, decay it now.
> + */

Code below seems pretty self-explaning to me. :)

> + if (sg_cpu->prev_iowait_boost) {
> + sugov_decay_iowait_boost(sg_cpu);
> + sg_cpu->prev_iowait_boost = false;
> + }
> }
> }
>
> static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> - unsigned long *max)
> + unsigned long *max, unsigned int flags)
> {
> unsigned long boost_util = sg_cpu->iowait_boost;
> unsigned long boost_max = sg_cpu->iowait_boost_max;
> @@ -195,7 +222,16 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
> *util = boost_util;
> *max = boost_max;
> }
> - sg_cpu->iowait_boost >>= 1;
> +
> + /*
> + * Incase iowait boost just happened on this CPU, don't reduce it right
> + * away since then the iowait boost will never increase on subsequent
> + * in_iowait wakeups.
> + */
> + if (flags & SCHED_CPUFREQ_IOWAIT && this_cpu_ptr(&sugov_cpu) == sg_cpu)
> + return;
> +
> + sugov_decay_iowait_boost(sg_cpu);

Mmm, do we need to decay even when we are computing frequency requests
for a domain?

Considering it a per-cpu thing, isn't enough that it gets bumped up or
decayed only when a CPU does an update (by using the above from
sugov_update_shared)?

If we go this way I think we will only need to reset prev_iowait_boost
if delta_ns > TICK_NSEC during the for_each_cpu() loop of sugov_next_
freq_shared().

Thanks,

- Juri