Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group
From: Brendan Jackman
Date: Tue Aug 22 2017 - 06:41:46 EST
On Tue, Aug 22 2017 at 07:48, Vincent Guittot wrote:
> On 21 August 2017 at 17:21, Brendan Jackman <brendan.jackman@xxxxxxx> wrote:
>> The current use of returning NULL from find_idlest_group is broken in
> [snip]
>> ---
>> kernel/sched/fair.c | 34 +++++++++++++++++++---------------
>> 1 file changed, 19 insertions(+), 15 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 64618d768546..7cb5ed719cf9 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5382,26 +5382,29 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
>> * domain.
>> */
>> static struct sched_group *
>> -find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>> - int this_cpu, int sd_flag)
>> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int sd_flag)
>> {
>> struct sched_group *idlest = NULL, *group = sd->groups;
>> + struct sched_group *local_group = sd->groups;
>> struct sched_group *most_spare_sg = NULL;
>> - unsigned long min_runnable_load = ULONG_MAX, this_runnable_load = 0;
>> - unsigned long min_avg_load = ULONG_MAX, this_avg_load = 0;
>> + unsigned long min_runnable_load = ULONG_MAX, this_runnable_load = ULONG_MAX;
>> + unsigned long min_avg_load = ULONG_MAX, this_avg_load = ULONG_MAX;
>
> Good catch for this_runnable_load, indeed there is a problem is
> local_group is not allowed
>
>> unsigned long most_spare = 0, this_spare = 0; >> int load_idx = sd->forkexec_idx;
>> int imbalance_scale = 100 + (sd->imbalance_pct-100)/2;
>> unsigned long imbalance = scale_load_down(NICE_0_LOAD) *
>> (sd->imbalance_pct-100) / 100;
>>
>> + if (!cpumask_intersects(sched_domain_span(sd), &p->cpus_allowed))
>> + return NULL;
>
> You should put that test above at upper level in the while (sd) loop
> or even before the while(sd) loop as there is even no need to start
> this while loop in this case. There is no need to call
> find_idlest_group if there is no cpu allowed and no group to find.
> Then, you can keep the current behavior of find_idlest_group() which
> return null if there no best group than the local one
True. I have to admit the reason I didn't do this is because due to the
current "else while", doing this outside the loop would have meant
another level of indentation. But actually if I break out a new function
as in Peter's proposal I can avoid that (plus I probably shouldn't let
code beauty overrule warm-path performance :| ).
>> +
>> if (sd_flag & SD_BALANCE_WAKE)
>> load_idx = sd->wake_idx;
>>
>> do {
>> unsigned long load, avg_load, runnable_load;
>> unsigned long spare_cap, max_spare_cap;
>> - int local_group;
>> + bool group_is_local = group == local_group;
>> int i;
>>
>> /* Skip over this group if it has no CPUs allowed */
>
> [snip]
>
>> @@ -5927,8 +5927,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>> continue;
>> }
>>
>> - group = find_idlest_group(sd, p, cpu, sd_flag);
>> + group = find_idlest_group(sd, p, sd_flag);
>> if (!group) {
>> + break;
>> + } else if (group == sd->groups) {
>> + new_cpu = cpu;
>
> The " new_cpu = cpu" above should be put before the while(sd) loop
> like in Peter's proposal
I don't think that would work - I've responded along with Joel in the
other subthread.
>> + /* Now try balancing at a lower domain level of cpu */
>> sd = sd->child;
>> continue;
>> }
>> --
>> 2.14.1
>>