Re: [PATCH 09/16] sched: normalize tg load contributions againstrunnable time

From: Peter Zijlstra
Date: Fri Jul 06 2012 - 07:52:28 EST

On Wed, 2012-07-04 at 21:48 +0200, Peter Zijlstra wrote:
> On Wed, 2012-06-27 at 19:24 -0700, Paul Turner wrote:
> > Entities of equal weight should receive equitable distribution of cpu time.
> > This is challenging in the case of a task_group's shares as execution may be
> > occurring on multiple cpus simultaneously.
> >
> > To handle this we divide up the shares into weights proportionate with the load
> > on each cfs_rq. This does not however, account for the fact that the sum of
> > the parts may be less than one cpu and so we need to normalize:
> > load(tg) = min(runnable_avg(tg), 1) * tg->shares
> > Where runnable_avg is the aggregate time in which the task_group had runnable
> > children.
> I remember we had a bit of a discussion on this last time, I thought you
> were going to convince me this approximation was 'right'.
> Care to still do so.. the rationale used should at least live in a
> comment somewhere, otherwise someone will go silly trying to understand
> things later on.

So if we treat the per-cpu utilization u_i as probability, then we're
looking for:

P(\Union_{i=1..n} u_i) :=
\Sum_{k=1..n} (-1)^(k-1) P(\Intersection_{i=1..k} u_i)

Computing this however is far too expensive, what we can do is
approximate by setting u = avg(u_i) and then using:

u_i == u_j for all i,j

and assuming all variables are independent, giving us:

P(A \Intersection B) = P(A)P(B)

This then yields:

P(\Union_{i=1..n} u_i) ~= \Sum_{k=1..n} (-1)^(k-1) (n choose k) u^k

Which unfortunately isn't a series I found a sane solution for, but
numerically (see below) we can see it very quickly approaches 1 when n
>> 1.

Therefore, the chosen approximation of min(1, \Sum_i u_i) is indeed a
sane approximation since for very small u_i and/or small n the sum is
less likely to exceed 1 and for big u_i and/or big n the clip to 1 is
indeed correct.


Was this what you meant? :-)

Now all that is left is grok the actual code..


define f (x) {
if (x <= 1) return (1);
return (f(x-1) * x);

define choose (n,k) {
return f(n) / (f(n-k) * f(k));

define pu (p,n) {
auto s, k

s = 0;
for (k = 1; k <= n; k++) {
s += (-1)^(k-1) * choose(n,k) * p^k;

return s;

for (n=2; n<128; n*=2) {
print n, ": "
for (p = 1; p < 11; p++) {
print pu(p/10,n), " "
print "\n"
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at