Re: [PATCH] sched/fair: Optimize sum computation with a lookup table

From: Peter Zijlstra
Date: Fri Apr 08 2016 - 06:45:52 EST


On Fri, Apr 08, 2016 at 12:44:32PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 08, 2016 at 10:07:20AM +0800, Yuyang Du wrote:
> > __compute_runnable_contrib() uses a loop to compute sum, whereas a
> > table loopup can do it faster in a constant time.
>
> > - /* Compute \Sum k^n combining precomputed values for k^i, \Sum k^j */
> > - do {
> > - contrib /= 2; /* y^LOAD_AVG_PERIOD = 1/2 */
> > - contrib += runnable_avg_yN_sum[LOAD_AVG_PERIOD];
> > -
> > - n -= LOAD_AVG_PERIOD;
> > - } while (n > LOAD_AVG_PERIOD);
> > -
> > + /* Since n < LOAD_AVG_MAX_N, n/LOAD_AVG_PERIOD < 11 */
> > + contrib = __accumulated_sum_N32[n/LOAD_AVG_PERIOD];
> > + n %= LOAD_AVG_PERIOD;
> > contrib = decay_load(contrib, n);
> > return contrib + runnable_avg_yN_sum[n];
>
> You replace a simple loop with a DIV instruction and a potential extra
> cachemiss.

Duh, LOAD_AVG_PERIOD is 32, that's a simple shift.