Re: [RFC PATCH 3/4] sched/topology: remove smt_gain
From: Vincent Guittot
Date: Wed Sep 05 2018 - 03:36:57 EST
On Tue, 4 Sep 2018 at 12:37, Srikar Dronamraju
<srikar@xxxxxxxxxxxxxxxxxx> wrote:
>
> * Vincent Guittot <vincent.guittot@xxxxxxxxxx> [2018-09-04 11:36:26]:
>
> > Hi Srikar,
> >
> > Le Tuesday 04 Sep 2018 Ã 01:24:24 (-0700), Srikar Dronamraju a Ãcrit :
> > > However after this change, capacity_orig of each SMT thread would be
> > > 1024. For example SMT 8 core capacity_orig would now be 8192.
> > >
> > > smt_gain was suppose to make a multi threaded core was slightly more
> > > powerful than a single threaded core. I suspect if that sometimes hurt
> >
> > Is there system with both single threaded and multi threaded core ?
> > That was the main open point for me (and for Qais too)
> >
>
> I dont know of any systems that have come with single threaded and
> multithreaded. However some user can still offline few threads in a core
> while leaving other cores untouched. I dont really know why somebody
> would want to do it. For example, some customer was toying with SMT 3
> mode in a SMT 8 power8 box.
In this case, it means that we have the same core capacity whatever
the number of CPUs
and a core with SMT 3 will be set with the same compute capacity as
the core with SMT 8.
Does it still make sense ?
>
> >
> > > us when doing load balance between 2 cores i.e at MC or DIE sched
> > > domain. Even with 2 threads running on a core, the core might look
> > > lightly loaded 2048/8192. Hence might dissuade movement to a idle core.
> >
> > Then, there is the sibling flag at SMT level that normally ensures 1 task per
> > core for such UC
> >
>
> Agree.
>
> > >
> > > I always wonder why arch_scale_cpu_capacity() is called with NULL
> > > sched_domain, in scale_rt_capacity(). This way capacity might actually
> >
> > Probably because until this v4.19-rcxx version, the rt scaling was done
> > relatively to local cpu capacity:
> > capacity = arch_scale_cpu() * scale_rt_capacity / SCHED_CAPACITY_SCALE
> >
> > Whereas now, it directly returns the remaining capacity
> >
> > > be more than the capacity_orig. I am always under an impression that
> > > capacity_orig > capacity. Or am I misunderstanding that?
> >
> > You are right, there is a bug for SMT and the patch below should fix it.
> > Nevertheless, we still have the problem in some other places in the code.
> >
> > Subject: [PATCH] sched/fair: fix scale_rt_capacity() for SMT
> >
> > Since commit:
> > commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
> > scale_rt_capacity() returns the remaining capacity and not a scale factor
> > to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by
> > scale_rt_capacity() so we must take the sched_domain argument
> >
> > Fixes: 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
> > Reported-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
> > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>
> Reviewed-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
>
> --
> Thanks and Regards
> Srikar Dronamraju
>