Re: [PATCH 1/3] sched/fair: Introduce scaled capacity awareness in find_idlest_cpu code path

From: Joel Fernandes
Date: Thu Oct 12 2017 - 17:47:25 EST


On Thu, Oct 12, 2017 at 10:03 AM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
> Hi Joel, Atish,
>
> Moving off-line discussions to LKML, just so everyone's on the same page,
> I actually like this version now and it is outperforming my previous
> code, so I am on board with this version. It makes the code simpler too.

I think you should have explained what the version does differently.
Nobody can read your mind.

>
> Since we need a fast way of returning an idle cpu in select_idle_sibling
> path, I think that can remain as it is (or may be we can argue about the
> patch on that thread)

This is hardly an explanation of the diff below.

>
> If what I said abovemakes sense to everyone, I will send out a v6.
>
> As always, please let me know what you think.

More below:

> Thanks,
> Rohit
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56f343b..a1f622c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct
> *p);
>
> static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
> {
> - return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
> + return capacity_of(cpu) - cpu_util_wake(cpu, p);
> }
>
> /*
> @@ -5870,6 +5870,7 @@ find_idlest_group_cpu(struct sched_group *group,
> struct task_struct *p, int this
> unsigned long load, min_load = ULONG_MAX;
> unsigned int min_exit_latency = UINT_MAX;
> u64 latest_idle_timestamp = 0;
> + unsigned int idle_cpu_cap = 0;
> int least_loaded_cpu = this_cpu;
> int shallowest_idle_cpu = -1;
> int i;
> @@ -5881,6 +5882,7 @@ find_idlest_group_cpu(struct sched_group *group,
> struct task_struct *p, int this
> /* Traverse only the allowed CPUs */
> for_each_cpu_and(i, sched_group_span(group), &p->cpus_allowed) {
> if (idle_cpu(i)) {
> + int idle_candidate = -1;
> struct rq *rq = cpu_rq(i);
> struct cpuidle_state *idle = idle_get_state(rq);
> if (idle && idle->exit_latency < min_exit_latency) {
> @@ -5891,7 +5893,7 @@ find_idlest_group_cpu(struct sched_group *group,
> struct task_struct *p, int this
> */
> min_exit_latency = idle->exit_latency;
> latest_idle_timestamp = rq->idle_stamp;
> - shallowest_idle_cpu = i;
> + idle_candidate = i;
> } else if ((!idle || idle->exit_latency == min_exit_latency) &&
> rq->idle_stamp > latest_idle_timestamp) {
> /*
> @@ -5900,8 +5902,14 @@ find_idlest_group_cpu(struct sched_group *group,
> struct task_struct *p, int this
> * a warmer cache.
> */
> latest_idle_timestamp = rq->idle_stamp;
> - shallowest_idle_cpu = i;
> + idle_candidate = i;
> }
> +
> + if (idle_candidate != -1 &&
> + (capacity_of(idle_candidate) > idle_cpu_cap)) {
> + shallowest_idle_cpu = idle_candidate;
> + idle_cpu_cap = capacity_of(idle_candidate);
> + }

This is broken, incase idle_candidate != -1 but idle_cpu_cap makes the
condition false - you're still setting min_exit_latency which is
wrong.

Also this means if you have 2 CPUs and 1 is in a shallower idle state
than the other, but lesser in capacity, then it would select the CPU
with less shallow idle state right? So 'shallowest_idle_cpu' loses its
meaning.

thanks,

- Joel

[..]