Re: [RFC PATCH 3/4] sched/topology: remove smt_gain

From: Vincent Guittot
Date: Tue Sep 04 2018 - 05:36:37 EST


Hi Srikar,

Le Tuesday 04 Sep 2018 à 01:24:24 (-0700), Srikar Dronamraju a écrit :
> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx (open list)
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 4a2e8ca..b1715b8 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
> > static __always_inline
> > unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
> > {
> > - if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
> > - return sd->smt_gain / sd->span_weight;
> > -
> > return SCHED_CAPACITY_SCALE;
>
> Without this change, the capacity_orig of an SMT would have been based
> on the number of threads.
> For example on SMT2, capacity_orig would have been 589 and
> for SMT 8, capacity_orig would have been 148.
>
> However after this change, capacity_orig of each SMT thread would be
> 1024. For example SMT 8 core capacity_orig would now be 8192.
>
> smt_gain was suppose to make a multi threaded core was slightly more
> powerful than a single threaded core. I suspect if that sometimes hurt

Is there system with both single threaded and multi threaded core ?
That was the main open point for me (and for Qais too)


> us when doing load balance between 2 cores i.e at MC or DIE sched
> domain. Even with 2 threads running on a core, the core might look
> lightly loaded 2048/8192. Hence might dissuade movement to a idle core.

Then, there is the sibling flag at SMT level that normally ensures 1 task per
core for such UC

>
> I always wonder why arch_scale_cpu_capacity() is called with NULL
> sched_domain, in scale_rt_capacity(). This way capacity might actually

Probably because until this v4.19-rcxx version, the rt scaling was done
relatively to local cpu capacity:
capacity  = arch_scale_cpu() * scale_rt_capacity / SCHED_CAPACITY_SCALE

Whereas now, it directly returns the remaining capacity

> be more than the capacity_orig. I am always under an impression that
> capacity_orig > capacity. Or am I misunderstanding that?

You are right, there is a bug for SMT and the patch below should fix it.
Nevertheless, we still have the problem in some other places in the code.

Subject: [PATCH] sched/fair: fix scale_rt_capacity() for SMT

Since commit:
commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
scale_rt_capacity() returns the remaining capacity and not a scale factor
to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by
scale_rt_capacity() so we must take the sched_domain argument

Fixes: 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
Reported-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
---
kernel/sched/fair.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 309c93f..c73e1fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7481,10 +7481,10 @@ static inline int get_sd_load_idx(struct sched_domain *sd,
return load_idx;
}

-static unsigned long scale_rt_capacity(int cpu)
+static unsigned long scale_rt_capacity(struct sched_domain *sd, int cpu)
{
struct rq *rq = cpu_rq(cpu);
- unsigned long max = arch_scale_cpu_capacity(NULL, cpu);
+ unsigned long max = arch_scale_cpu_capacity(sd, cpu);
unsigned long used, free;
unsigned long irq;

@@ -7506,7 +7506,7 @@ static unsigned long scale_rt_capacity(int cpu)

static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
- unsigned long capacity = scale_rt_capacity(cpu);
+ unsigned long capacity = scale_rt_capacity(sd, cpu);
struct sched_group *sdg = sd->groups;

cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(sd, cpu);
--
2.7.4

>
> --
> Thanks and Regards
> Srikar Dronamraju
>