Re: [PATCH] sched/fair: fix sgc->{min,max}_capacity miscalculate

From: Peng Liu
Date: Wed Jan 01 2020 - 09:13:55 EST


On Wed, Jan 01, 2020 at 05:56:49AM +0000, Valentin Schneider wrote:
> Hi Peng,
>
> On 31/12/2019 03:51, Peng Liu wrote:
> > commit bf475ce0a3dd ("sched/fair: Add per-CPU min capacity to
> > sched_group_capacity") introduced per-cpu min_capacity.
> >
> > commit e3d6d0cb66f2 ("sched/fair: Add sched_group per-CPU max capacity")
> > introduced per-cpu max_capacity.
> >
> > sgc->capacity is the *SUM* of all CPU's capacity in the group.
> > sgc->{min,max}_capacity are the sg per-cpu variables. Compare with
> > sgc->capacity to get sgc->{min,max}_capacity makes no sense. Instead,
> > we should compare one by one in each iteration to get
> > sgc->{min,max}_capacity of the group.
> >
>
> Worth noting this only affects the SD_OVERLAP case, the other case is fine
> (I checked again just to be sure).
>
> Now, on the bright side of things I don't think this currently causes any
> harm. The {min,max}_capacity values are used in bits of code that only
> gets run on topologies with asymmetric CPU µarchs (SD_ASYM_CPUCAPACITY), and
> I know of no such system that is also NUMA, i.e. end up with SD_OVERLAP
> (here's hoping nobody gets any funny idea).
>
> Still, nice find!

Valentin, thanks for your time!

>
> > Signed-off-by: Peng Liu <iwtbavbm@xxxxxxxxx>
> > ---
> > kernel/sched/fair.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 2d170b5da0e3..97b164fcda93 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7795,6 +7795,7 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
> > for_each_cpu(cpu, sched_group_span(sdg)) {
> > struct sched_group_capacity *sgc;
> > struct rq *rq = cpu_rq(cpu);
> > + unsigned long cap;
> >
> > /*
> > * build_sched_domains() -> init_sched_groups_capacity()
> > @@ -7808,14 +7809,16 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
> > * causing divide-by-zero issues on boot.
> > */
> > if (unlikely(!rq->sd)) {
> > - capacity += capacity_of(cpu);
> > + cap = capacity_of(cpu);
> > + capacity += cap;
> > + min_capacity = min(cap, min_capacity);
> > + max_capacity = max(cap, max_capacity);
> > } else {
> > sgc = rq->sd->groups->sgc;
> > capacity += sgc->capacity;
> > + min_capacity = min(sgc->min_capacity, min_capacity);
> > + max_capacity = max(sgc->max_capacity, max_capacity);
> > }
> > -
> > - min_capacity = min(capacity, min_capacity);
> > - max_capacity = max(capacity, max_capacity);
> > }
> > } else {
> > /*
> >
>
> All that could be shortened like the below, no?
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 08a233e97a01..9f6c015639ef 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7773,8 +7773,8 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
> */
>
> for_each_cpu(cpu, sched_group_span(sdg)) {
> - struct sched_group_capacity *sgc;
> struct rq *rq = cpu_rq(cpu);
> + unsigned long cpu_cap;
>
> /*
> * build_sched_domains() -> init_sched_groups_capacity()
> @@ -7787,15 +7787,15 @@ void update_group_capacity(struct sched_domain *sd, int cpu)
> * This avoids capacity from being 0 and
> * causing divide-by-zero issues on boot.
> */
> - if (unlikely(!rq->sd)) {
> - capacity += capacity_of(cpu);
> - } else {
> - sgc = rq->sd->groups->sgc;
> - capacity += sgc->capacity;
> - }
> + if (unlikely(!rq->sd))
> + cpu_cap = capacity_of(cpu);

--------------------------------------------------------------
> + else
> + cpu_cap = rq->sd->groups->sgc->capacity;

sgc->capacity is the *sum* of all CPU's capacity in that group, right?
{min,max}_capacity are per CPU variables(*part* of a group). So we can't
compare *part* to *sum*. Am I overlooking something? Thanks.

> +
> + min_capacity = min(cpu_cap, min_capacity);
> + max_capacity = max(cpu_cap, max_capacity);
>
> - min_capacity = min(capacity, min_capacity);
> - max_capacity = max(capacity, max_capacity);
> + capacity += cpu_cap;
> }
> } else {
> /*