Re: fair group scheduler not so fair?

From: Srivatsa Vaddagiri
Date: Fri May 23 2008 - 03:35:45 EST


On Thu, May 22, 2008 at 06:17:37PM -0600, Chris Friesen wrote:
> Peter Zijlstra wrote:
>
>> Given the following:
>> root
>> / | \
>> _A_ 1 2
>> /| |\
>> 3 4 5 B
>> / \
>> 6 7
>> CPU0 CPU1
>> root root
>> / \ / \
>> A 1 A 2
>> / \ / \
>> 4 B 3 5
>> / \
>> 6 7
>
> How do you move specific groups to different cpus. Is this simply using
> cpusets?

No. Moving groups to different cpus is just a group-aware extension to
move_tasks() that is invoked as part of regular load balance operation.
move_tasks()->sched_fair_class.load_balance() has been modified to
understand how much various task-groups at various levels (ex: A at level 1,
B at level 2 etc) contribute to cpu load. It moves tasks between cpus
using this knowledge.

For ex: if we were to consider all tasks shown above to be in same cpu,
CPU0, this is how it would look:

CPU0 CPU1
root root
/ | \
A 1 2
/| |\
3 4 5 B
/ \
6 7

Then cpu0 load = weight of A + weight of 1 + weight of 2
= 1024 + 1024 + 1024 = 3072

while cpu1 load = 0

load to be moved to cut down this imbalance = 3072/2 = 1536

move_tasks() running on CPU1 would try to pull iteratively tasks such
that total weight moved is <= 1536.

Task moved Total Weight moved
--------- ------------
2 1024
3 1024 + 256 = 1280
5 1280 + 256 = 1536

resulting in:

CPU0 CPU1
root root
/ \ / \
A 1 A 2
/ \ / \
4 B 3 5
/ \
6 7

>> Numerical examples given the above scenario, assuming every body's
>> weight is 1024:
>
>> s_(0,A) = s_(1,A) = 512
>
> Just to make sure I understand what's going on...this is half of 1024
> because it shows up on both cpus?

not exactly ..as Peter put it:

s_(i,g) = W_g * rw_(i,g) / \Sum_j rw_(j,g)

In this case,

s_(0,A) = W_A * rw_(0, A) / \Sum_j rw_(j, A)

W_A = shares given to A by admin = 1024

rw_(0,A) = Weight of 4 + Weight of B = 1024 + 1024 = 2048
rw_(1,A) = Weight of 3 + Weight of 5 = 1024 + 1024 = 2048
\Sum_j rw_(j, A) = 4096

So,

s_(0,A) = 1024 *2048 / 4096 = 512


>> s_(0,B) = 1024, s_(1,B) = 0
>
> This gets the full 1024 because it's only on one cpu.

Not exactly. rw_(0, B) = \Sum_j rw_(j, B) and that's why s_(0,B) = 1024

>> rw_(0,A) = rw(1,A) = 2048
>> rw_(0,B) = 2048, rw_(1,B) = 0
>
> How do we get 2048? Shouldn't this be 1024?

Hope this is clarified from above.

>> h_load_(0,A) = h_load_(1,A) = 512
>> h_load_(0,B) = 256, h_load(1,B) = 0
>
> At this point the numbers make sense, but I'm not sure how the formula for
> h_load_ works given that I'm not sure what's going on for rw_.

--
Regards,
vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/