Re: [PATCH 14/16] sched: make __update_entity_runnable_avg() fast

From: Paul Turner
Date: Wed Jul 11 2012 - 20:15:30 EST


So I've been trying to dig up the little proglets that originally
computed this stuff, since some specific adjustments were made but for
the life of me[*] I cannot find it, so I am stuck trying to reverse
engineer it like you :-).
[*] Including some over-night greps on all my source trees.

The short answer is I can explain some of the differences, but not
all; I suspect that perhaps I generated things using a wonky table.
Will update the tables with the tweaked numbers below for next
posting.

Updated values:

inverses for fixed point multiplication by y^k
0: ffffffff
1: fa83b2da
2: f5257d14
3: efe4b99a
4: eac0c6e6
5: e5b906e6
6: e0ccdeeb
7: dbfbb796
8: d744fcc9
9: d2a81d91
10: ce248c14
11: c9b9bd85
12: c5672a10
13: c12c4cc9
14: bd08a39e
15: b8fbaf46
16: b504f333
17: b123f581
18: ad583ee9
19: a9a15ab4
20: a5fed6a9
21: a2704302
22: 9ef5325f
23: 9b8d39b9
24: 9837f050
25: 94f4efa8
26: 91c3d373
27: 8ea4398a
28: 8b95c1e3
29: 88980e80
30: 85aac367
31: 82cd8698
[ Delta versus previous is purely double vs float, no surprises on this one ]

convergence
345> 47765
[ This is accounting for the fact that we're not getting to use a
perfect value of y, but it is the value we will converge to with max
individual updates and our fastmult y^1 approximation ]

sum y^n
[ Where:
exact_n = exact_{n-1} * exact_y + 1024.0
floor_n = FLOOR( floor_{n-1} * exact_y * 1024.0 )
shift_n = approximating exact_n using shift/div for mult
fastmul1 = approximating exact_n using inverse mult of y^1 recursively
fastmul2 = \sum 1024*y^n using inverse mult of y^k

Error terms for the approximations:
sum y^n
exact floor shift fastmul1 fastmul2
1: 1002 -0 0 0 0
2: 1983 -1 0 1 1
3: 2942 -1 -1 1 1
4: 3881 -1 -2 1 1
5: 4800 -2 -3 1 1
6: 5699 -2 -4 1 1
7: 6579 -3 -5 1 1
8: 7440 -3 -6 1 1
9: 8283 -4 -6 2 2
10: 9108 -5 -7 1 2
11: 9914 -5 -8 1 2
12: 10704 -6 -9 1 2
13: 11477 -7 -9 2 3
14: 12233 -7 -10 2 3
15: 12973 -7 -11 2 3
16: 13697 -7 -12 2 3
17: 14405 -7 -13 1 3
18: 15099 -8 -14 1 3
19: 15777 -8 -16 0 3
20: 16441 -8 -17 0 3
21: 17091 -9 -18 0 3
22: 17727 -9 -18 1 4
23: 18349 -9 -20 0 3
24: 18958 -9 -20 1 4
25: 19554 -9 -21 1 4
26: 20137 -9 -22 1 4
27: 20707 -9 -24 1 4
28: 21266 -10 -25 1 4
29: 21812 -10 -27 0 3
30: 22347 -11 -28 1 4
31: 22870 -11 -30 0 3

The concern here is that we don't want approximations that
over-estimate to make possible exceeding our converged max load sum
above, which was accumulated using only single y^n steps.

For this reason I prefer the most conservative floor approximation
which never over-estimates, with errors <0.1%. I think this is what I
chose previously (the first terms all align), but I can't explain the
divergence for higher n.

(Exact values)
exact floor shift fastmul1 fastmul2
1: 1002 1002 1002 1002 1002
2: 1983 1982 1982 1983 1983
3: 2942 2941 2941 2943 2943
4: 3881 3880 3879 3882 3882
5: 4800 4798 4797 4801 4801
6: 5699 5697 5695 5700 5700
7: 6579 6576 6574 6580 6580
8: 7440 7437 7434 7441 7441
9: 8283 8279 8276 8284 8284
10: 9108 9103 9100 9108 9109
11: 9914 9909 9906 9915 9916
12: 10704 10698 10695 10705 10706
13: 11477 11470 11467 11478 11479
14: 12233 12226 12222 12234 12235
15: 12973 12966 12961 12974 12975
16: 13697 13690 13684 13698 13699
17: 14405 14398 14392 14406 14408
18: 15099 15091 15084 15099 15101
19: 15777 15769 15761 15777 15780
20: 16441 16433 16424 16441 16444
21: 17091 17082 17073 17091 17094
22: 17727 17718 17708 17727 17730
23: 18349 18340 18329 18349 18352
24: 18958 18949 18937 18958 18961
25: 19554 19545 19532 19554 19557
26: 20137 20128 20114 20137 20140
27: 20707 20698 20683 20708 20711
28: 21266 21256 21240 21266 21269
29: 21812 21802 21785 21812 21815
30: 22347 22336 22318 22347 22350
31: 22870 22859 22840 22870 22873

And for posterity, a simple generator so that I don't lose it again:
#include <math.h>
#include <stdio.h>

#define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
#define N 32
#define WMULT_SHIFT 32

const long WMULT_CONST = ((1UL << N) - 1);
const double y = .97857206208770013451;

long approx_decay(int c) {
return (c * 4008) >> 12;
}

long mult_inv_array[N];
void calc_mult_inv() {
int i;
double yn = 0;

printf("inverses\n");
for (i = 0; i < N; i++) {
yn = (double)WMULT_CONST * pow(y, i);
mult_inv_array[i] = yn;
printf("%d: %8lx\n", i, mult_inv_array[i]);
}

printf("\n");
}

long mult_inv(long c, int n) {
return SRR(c * runnable_avg_yN_inv[n], WMULT_SHIFT);
}

void calc_yn_sum(int n)
{
int i;
double sum = 0, sum_fl = 0, diff = 0;
long approx = 0, approx_fm = 0, approx_fm2 = 0;

printf("sum y^n\n");
printf(" %8s %8s %8s %8s %8s\n", "exact", "floor", "shift",
"fastmul1", "fastmul2");
for (i = 1; i < n; i++) {
sum = (y * sum + y * 1024);
sum_fl = floor(y * sum_fl+ y * 1024);
approx = approx_decay(approx) + approx_decay(1024);
approx_fm = mult_inv(approx_fm, 1) + mult_inv(1024, 1);
approx_fm2 += mult_inv(1024, i);

/*diff = sum;*/
printf("%2d: %8.0f %8.0f %8ld %8ld %8ld\n", i, sum,
sum_fl - diff,
approx - (long)diff,
approx_fm - (long)diff,
approx_fm2 - (long)diff);

}
printf("\n");
}

void calc_conv(long n) {
long old_n;
int i = -1;

printf("convergence\n");
do {
old_n = n;
n = mult_inv(n, 1) + 1024;
i++;
} while (n != old_n);
printf("%d> %ld\n", i - 1, n);
printf("\n");
}

void main() {
calc_mult_inv();
calc_conv(1024);
calc_yn_sum(N);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/