Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

From: Morten Rasmussen
Date: Fri Sep 11 2015 - 05:58:45 EST


On Fri, Sep 11, 2015 at 03:46:51PM +0800, Leo Yan wrote:
> Hi Morten,
>
> On Tue, Sep 08, 2015 at 05:53:31PM +0100, Morten Rasmussen wrote:
> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> > > On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > >
> > > > Something like teh below..
> > > >
> > > > Another thing to ponder; the downside of scaled_delta_w is that its
> > > > fairly likely delta is small and you loose all bits, whereas the weight
> > > > is likely to be large can could loose a fwe bits without issue.
> > >
> > > That issue applies both to load and util.
> > >
> > > >
> > > > That is, in fixed point scaling like this, you want to start with the
> > > > biggest numbers, not the smallest, otherwise you loose too much.
> > > >
> > > > The flip side is of course that now you can share a multiplcation.
> > >
> > > But if we apply the scaling to the weight instead of time, we would only
> > > have to apply it once and not three times like it is now? So maybe we
> > > can end up with almost the same number of multiplications.
> > >
> > > We might be loosing bits for low priority task running on cpus at a low
> > > frequency though.
> >
> > Something like the below. We should be saving one multiplication.
> >
> > --- 8< ---
> >
> > From: Morten Rasmussen <morten.rasmussen@xxxxxxx>
> > Date: Tue, 8 Sep 2015 17:15:40 +0100
> > Subject: [PATCH] sched/fair: Scale load/util contribution rather than time
> >
> > When updating load/util tracking the time delta might be very small (1)
> > in many cases, scaling it futher down with frequency and cpu invariance
> > scaling might cause us to loose precision. Instead of scaling time we
> > can scale the weight of the task for load and the capacity for
> > utilization. Both weight (>=15) and capacity should be significantly
> > bigger in most cases. Low priority tasks might still suffer a bit but
> > worst should be improved, as weight is at least 15 before invariance
> > scaling.
> >
> > Signed-off-by: Morten Rasmussen <morten.rasmussen@xxxxxxx>
> > ---
> > kernel/sched/fair.c | 38 +++++++++++++++++++-------------------
> > 1 file changed, 19 insertions(+), 19 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 9301291..d5ee72a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2519,8 +2519,6 @@ static u32 __compute_runnable_contrib(u64 n)
> > #error "load tracking assumes 2^10 as unit"
> > #endif
> >
> > -#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
> > -
> > /*
> > * We can represent the historical contribution to runnable average as the
> > * coefficients of a geometric series. To do this we sub-divide our runnable
> > @@ -2553,10 +2551,10 @@ static __always_inline int
> > __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > unsigned long weight, int running, struct cfs_rq *cfs_rq)
> > {
> > - u64 delta, scaled_delta, periods;
> > + u64 delta, periods;
> > u32 contrib;
> > - unsigned int delta_w, scaled_delta_w, decayed = 0;
> > - unsigned long scale_freq, scale_cpu;
> > + unsigned int delta_w, decayed = 0;
> > + unsigned long scaled_weight = 0, scale_freq, scale_freq_cpu = 0;
> >
> > delta = now - sa->last_update_time;
> > /*
> > @@ -2577,8 +2575,13 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > return 0;
> > sa->last_update_time = now;
> >
> > - scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > - scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > + if (weight || running)
> > + scale_freq = arch_scale_freq_capacity(NULL, cpu);
> > + if (weight)
> > + scaled_weight = weight * scale_freq >> SCHED_CAPACITY_SHIFT;
> > + if (running)
> > + scale_freq_cpu = scale_freq * arch_scale_cpu_capacity(NULL, cpu)
> > + >> SCHED_CAPACITY_SHIFT;
>
> maybe below question is stupid :)
>
> Why not calculate the scaled_weight depend on cpu's capacity as well?
> So like: scaled_weight = weight * scale_freq_cpu.

IMHO, we should not scale load by cpu capacity since load isn't really
comparable to capacity. It is runnable time based (not running time like
utilization) and the idea is to used it for balancing when when the
system is fully utilized. When the system is fully utilized we can't say
anything about the true compute demands of a task, it may get exactly
the cpu time it needs or it may need much more. Hence it doesn't really
make sense to scale the demand by the capacity of the cpu. Two busy
loops on cpus with different cpu capacities should have the load as they
have the same compute demands.

I mentioned this briefly in the commit message of patch 3 in this
series.

Makes sense?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/