Re: [PATCH] sched/fair: fix sgc->{min,max}_capacity miscalculate
From: Valentin Schneider
Date: Wed Jan 01 2020 - 13:56:05 EST
On 01/01/2020 14:13, Peng Liu wrote:
>> ---
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 08a233e97a01..9f6c015639ef 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7773,8 +7773,8 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
>> */
>>
>> for_each_cpu(cpu, sched_group_span(sdg)) {
>> - struct sched_group_capacity *sgc;
>> struct rq *rq = cpu_rq(cpu);
>> + unsigned long cpu_cap;
>>
>> /*
>> * build_sched_domains() -> init_sched_groups_capacity()
>> @@ -7787,15 +7787,15 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
>> * This avoids capacity from being 0 and
>> * causing divide-by-zero issues on boot.
>> */
>> - if (unlikely(!rq->sd)) {
>> - capacity += capacity_of(cpu);
>> - } else {
>> - sgc = rq->sd->groups->sgc;
>> - capacity += sgc->capacity;
>> - }
>> + if (unlikely(!rq->sd))
>> + cpu_cap = capacity_of(cpu);
>
> --------------------------------------------------------------
>> + else
>> + cpu_cap = rq->sd->groups->sgc->capacity;
>
> sgc->capacity is the *sum* of all CPU's capacity in that group, right?
Right
> {min,max}_capacity are per CPU variables(*part* of a group). So we can't
> compare *part* to *sum*. Am I overlooking something? Thanks.
>
AIUI rq->sd->groups->sgc->capacity should be the capacity of the rq's CPU
(IOW the groups here should be made of single CPUs).
This should be true regardless of overlapping domains, since they sit on top
of the regular domains. Let me paint an example with a simple 2-core SMT2
system:
MC [ ]
SMT [ ][ ]
0 1 2 3
cpu_rq(0)->sd will point to the sched_domain of CPU0 at SMT level (it is the
"base domain", IOW the lowest domain in the topology hierarchy). Its groups
will be:
{0} ----> {1}
^ /
`-----'
and sd->groups will point at the group spanning the "local" CPU, in our case
CPU0, and thus here will be a group containing only CPU0.
I do not know why sched_group_capacity is used here however. As I understand
things, we could use cpu_capacity() unconditionally.
>> +
>> + min_capacity = min(cpu_cap, min_capacity);
>> + max_capacity = max(cpu_cap, max_capacity);
>>
>> - min_capacity = min(capacity, min_capacity);
>> - max_capacity = max(capacity, max_capacity);
>> + capacity += cpu_cap;
>> }
>> } else {
>> /*