Re: [PATCH v5 09/14] sched: Add over-utilization/tipping point indicator

From: Vincent Guittot
Date: Thu Aug 09 2018 - 05:31:12 EST

On Tue, 24 Jul 2018 at 14:26, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> From: Morten Rasmussen <morten.rasmussen@xxxxxxx>
> Energy-aware scheduling is only meant to be active while the system is
> _not_ over-utilized. That is, there are spare cycles available to shift
> tasks around based on their actual utilization to get a more
> energy-efficient task distribution without depriving any tasks. When
> above the tipping point task placement is done the traditional way based
> on load_avg, spreading the tasks across as many cpus as possible based
> on priority scaled load to preserve smp_nice. Below the tipping point we
> want to use util_avg instead. We need to define a criteria for when we
> make the switch.
> The util_avg for each cpu converges towards 100% (1024) regardless of

remove the "(1024)" because util_avg converges to max cpu capacity
which can be different from 1024

> how many task additional task we may put on it. If we define
> over-utilized as:
> sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity)
> some individual cpus may be over-utilized running multiple tasks even
> when the above condition is false. That should be okay as long as we try
> to spread the tasks out to avoid per-cpu over-utilization as much as
> possible and if all tasks have the _same_ priority. If the latter isn't
> true, we have to consider priority to preserve smp_nice.