Re: [RFC PATCH v3 09/10] sched/fair: Select an energy-efficient CPU on task wake-up
From: Quentin Perret
Date: Tue Jun 19 2018 - 03:57:35 EST
Hi Pavan,
On Tuesday 19 Jun 2018 at 10:36:01 (+0530), Pavan Kondeti wrote:
> On Mon, May 21, 2018 at 03:25:04PM +0100, Quentin Perret wrote:
>
> <snip>
>
> > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed))
> > + prev_energy = best_energy = compute_energy(p, prev_cpu);
> > + else
> > + prev_energy = best_energy = ULONG_MAX;
> > +
> > + for_each_freq_domain(sfd) {
> > + unsigned long spare_cap, max_spare_cap = 0;
> > + int max_spare_cap_cpu = -1;
> > + unsigned long util;
> > +
> > + /* Find the CPU with the max spare cap in the freq. dom. */
> > + for_each_cpu_and(cpu, freq_domain_span(sfd), sched_domain_span(sd)) {
> > + if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
> > + continue;
> > +
> > + if (cpu == prev_cpu)
> > + continue;
> > +
> > + /* Skip CPUs that will be overutilized */
> > + util = cpu_util_wake(cpu, p) + task_util;
> > + cpu_cap = capacity_of(cpu);
> > + if (cpu_cap * 1024 < util * capacity_margin)
> > + continue;
> > +
> > + spare_cap = cpu_cap - util;
> > + if (spare_cap > max_spare_cap) {
> > + max_spare_cap = spare_cap;
> > + max_spare_cap_cpu = cpu;
> > + }
> > + }
> > +
> > + /* Evaluate the energy impact of using this CPU. */
> > + if (max_spare_cap_cpu >= 0) {
> > + cur_energy = compute_energy(p, max_spare_cap_cpu);
> > + if (cur_energy < best_energy) {
> > + best_energy = cur_energy;
> > + best_energy_cpu = max_spare_cap_cpu;
> > + }
> > + }
> > + }
> > +
> > + /*
> > + * We pick the best CPU only if it saves at least 1.5% of the
> > + * energy used by prev_cpu.
> > + */
> > + if ((prev_energy - best_energy) > (prev_energy >> 6))
> > + return best_energy_cpu;
> > +
> > + return prev_cpu;
> > +}
>
> We are comparing the best_energy_cpu against prev_cpu even when prev_cpu
> can't accommodate the waking task. Is this intentional? Should not we
> discard the prev_cpu if it can't accommodate the task.
>
> This can potentially make a BIG task run on a lower capacity CPU until
> load balancer kicks in and corrects the situation.
We shouldn't enter find_energy_efficient_cpu() in the first place if the
system is overutilized, so that shouldn't too much of an issue in
general.
But yeah, there is one small corner case where prev_cpu is overutilized
and the system has not been flagged as such yet (when the tasks wakes-up
shortly before the tick for ex). I think it's possible to cover this case
by extending the "if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed))"
condition at the top of the function with a check on capacity_margin.
I'll change that in v4.
Thanks !
Quentin