Re: [PATCH 2/3] sched/fair: Introduce scaled capacity awareness in select_idle_sibling code path
From: Joel Fernandes
Date: Tue Oct 03 2017 - 00:53:23 EST
Hi Rohit,
On Thu, Sep 28, 2017 at 8:09 AM, Rohit Jain <rohit.k.jain@xxxxxxxxxx> wrote:
[..]
>>>
>>> With this case, because we know from the past avg, one of the strands is
>>> running low on capacity, I am trying to return a better strand for the
>>> thread to start on.
>>>
>> I know what you're trying to do but they way you've retrofitted it into
>> the
>> core looks weird (to me) and makes the code unreadable and ugly IMO.
>>
>> Why not do something simpler like skip the core if any SMT thread has been
>> running at lesser capacity? I'm not sure if this works great or if the
>> maintainers
>> will prefer your or my below approach, but I find the below diff much
>> cleaner
>> for the select_idle_core bit. It also makes more sense since resources are
>> shared at SMT level so makes sense to me to skip the core altogether for
>> this:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6ee7242dbe0a..f324a84e29f1 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5738,14 +5738,17 @@ static int select_idle_core(struct task_struct *p,
>> struct sched_domain *sd, int
>> for_each_cpu_wrap(core, cpus, target) {
>> bool idle = true;
>> + bool full_cap = true;
>> for_each_cpu(cpu, cpu_smt_mask(core)) {
>> cpumask_clear_cpu(cpu, cpus);
>> if (!idle_cpu(cpu))
>> idle = false;
>> + if (!full_capacity(cpu))
>> + full_cap = false;
>> }
>> - if (idle)
>> + if (idle && full_cap)
>> return core;
>> }
>>
>
>
>
> Well, with your changes you will skip over fully idle cores which is not
> an ideal thing either. I see that you were advocating for select
> idle+lowest capacity core, whereas I was stopping at the first idlecore.
>
> Since the whole philosophy till now in this patch is "Don't spare an
> idle CPU", I think the following diff might look better to you. Please
> note this is only for discussion sakes, I haven't fully tested it yet.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ec15e5f..c2933eb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6040,7 +6040,9 @@ void __update_idle_core(struct rq *rq)
> static int select_idle_core(struct task_struct *p, struct sched_domain *sd,
> int target)
> {
> struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> - int core, cpu;
> + int core, cpu, rcpu, backup_core;
> +
> + rcpu = backup_core = -1;
>
> if (!static_branch_likely(&sched_smt_present))
> return -1;
> @@ -6052,15 +6054,34 @@ static int select_idle_core(struct task_struct *p,
> struct sched_domain *sd, int
>
> for_each_cpu_wrap(core, cpus, target) {
> bool idle = true;
> + bool full_cap = true;
>
> for_each_cpu(cpu, cpu_smt_mask(core)) {
> cpumask_clear_cpu(cpu, cpus);
> if (!idle_cpu(cpu))
> idle = false;
> +
> + if (!full_capacity(cpu)) {
> + full_cap = false;
> + }
> }
>
> - if (idle)
> + if (idle && full_cap)
> return core;
> + else if (idle && backup_core == -1)
> + backup_core = core;
> + }
> +
> + if (backup_core != -1) {
> + for_each_cpu(cpu, cpu_smt_mask(backup_core)) {
> + if (full_capacity(cpu))
> + return cpu;
> + else if ((rcpu == -1) ||
> + (capacity_of(cpu) > capacity_of(rcpu)))
> + rcpu = cpu;
> + }
> +
> + return rcpu;
> }
>
>
> Do let me know what you think.
I think that if there isn't a benefit in your tests in doing the above
vs the simpler approach, then I prefer the simpler approach especially
since there's no point/benefit in complicating the code for
select_idle_core.
thanks,
- Joel