Re: [RESEND PATCH 2/2] sched/fair: Optimize __update_sched_avg()
From: Peter Zijlstra
Date: Fri Mar 31 2017 - 03:01:51 EST
On Thu, Mar 30, 2017 at 03:02:47PM -0700, Paul Turner wrote:
> On Thu, Mar 30, 2017 at 7:14 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, Mar 30, 2017 at 02:16:58PM +0200, Peter Zijlstra wrote:
> >> On Thu, Mar 30, 2017 at 04:21:08AM -0700, Paul Turner wrote:
> >
> >> > > +
> >> > > + if (unlikely(periods >= LOAD_AVG_MAX_N))
> >> > > return LOAD_AVG_MAX;
> >
> >> >
> >> > Is this correct in the iterated periods > LOAD_AVG_MAX_N case?
> >> > I don't think the decay above is guaranteed to return these to zero.
> >>
> >> Ah!
> >>
> >> Indeed, so decay_load() needs LOAD_AVG_PERIOD * 63 before it truncates
> >> to 0, because every LOAD_AVG_PERIOD we half the value; loose 1 bit; so
> >> 63 of those and we're 0.
> >>
> >> But __accumulate_sum() OTOH returns LOAD_AVG_MAX after only
> >> LOAD_AVG_MAX_N, which < LOAD_AVG_PERIOD * 63.
> >>
> >> So yes, combined we exceed LOAD_AVG_MAX, which is bad. Let me think what
> >> to do about that.
> >
> >
> > So at the very least it should be decay_load(LOAD_AVG_MAX, 1) (aka
> > LOAD_AVG_MAX - 1024), but that still doesn't account for the !0
> > decay_load() of the first segment.
> >
> > I'm thinking that we can compute the middle segment, by taking the max
> > value and chopping off the ends, like:
> >
> >
> > p
> > c2 = 1024 \Sum y^n
> > n=1
> >
> > inf inf
> > = 1024 ( \Sum y^n - \Sum y^n - y^0 )
> > n=0 n=p
> >
> >
>
> So this is endemic to what I think is a deeper problem:
I think not.
47742*e((345/32)*l(.5))
27.12819932019487579284
So even without weight, 345 periods isn't enough to flatten
LOAD_AVG_MAX.
> The previous rounds of optimization folded weight into load_sum. I
> think this introduced a number of correctness problems:
I'll argue you on correctness; but these might well be problems indeed.
> a) the load_avg is no longer independent of weight; a high weight
> entity can linger for eons. [63 LOAD_AVG_PERIODS]
Certainly longer than before, agreed.
> b) it breaks the dynamic response of load_avg on a weight change.
> While nice is not common, there's a case that this is really important
> for which is cgroups with a low number of threads running. E.g. When we
> transition from 1->2 threads we immediately halve the weight, but
> because of the folding it takes a very large time to be numerically
> correct again.
I think this is a matter of semantics, not correctness. We did have that
weight in the past, so our past average including that isn't incorrect
per se.
Similarly, our vruntime includes increments based on prior weight.
Now; you're arguing that this is undesired, and this might well be.
> c) It doesn't work with scale_load_down and fractional shares below
> SCHED_LOAD_SCALE [we multiply in a zero -> zero rq load]
Yes, I noticed this, and this is indeed undesired.
> d) It makes doing stability/clipping above a nightmare.
Unsure what exactly you're referring to; the scale_down_load() boundary
at 0? How is this a separate point from c then?
> I think it's actually *costing* us cycles, since we end up multiplying
> in the weight every partial [load_sum] update, but we only actually
> need to compute it when updating load_avg [which is per period
> overflow].
So lets pull it out again -- but I don't think we need to undo all of
yuyang's patches for that. So please, look at the patch I proposed for
the problem you spotted. Lets fix the current state and take it from
there, ok?