Re: [PATCH v3 4/6] sched/fair: Remove scale_load_down() for load_avg

From: Yuyang Du
Date: Thu Apr 28 2016 - 23:11:38 EST


On Thu, Apr 28, 2016 at 12:25:32PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 05, 2016 at 12:12:29PM +0800, Yuyang Du wrote:
> > Currently, load_avg = scale_load_down(load) * runnable%. The extra scaling
> > down of load does not make much sense, because load_avg is primarily THE
> > load and on top of that, we take runnable time into account.
> >
> > We therefore remove scale_load_down() for load_avg. But we need to
> > carefully consider the overflow risk if load has higher range
> > (2*SCHED_FIXEDPOINT_SHIFT). The only case an overflow may occur due
> > to us is on 64bit kernel with increased load range. In that case,
> > the 64bit load_sum can afford 4251057 (=2^64/47742/88761/1024)
> > entities with the highest load (=88761*1024) always runnable on one
> > single cfs_rq, which may be an issue, but should be fine. Even if this
> > occurs at the end of day, on the condition where it occurs, the
> > load average will not be useful anyway.
>
> I do feel we need a little more words on the actual ramification of
> overflowing here.
>
> Yes, having 4m tasks on a single runqueue will be somewhat unlikely, but
> if it happens, then what will the user experience? How long (if ever)
> does it take for numbers to correct themselves etc..

Well, regarding the experience, this should be a stress test study.

But if the system can miraculously survive, and we end up in the scenario
that we have a ~0ULL load_sum and the rq suddently dropps to 0 load, it
would take roughly 2 seconds (=32ms*64) to converge. This time is the bound.