CFS group scheduler fairness broken starting from 2.6.29-rc1

From: Bharata B Rao
Date: Thu Jul 23 2009 - 03:59:29 EST


Hi,

Group scheduler fainess is broken since 2.6.29-rc1. git bisect led me
to this commit:

commit ec4e0e2fe018992d980910db901637c814575914
Author: Ken Chen <kenchen@xxxxxxxxxx>
Date: Tue Nov 18 22:41:57 2008 -0800

sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares

Impact: make load-balancing more consistent

In the update_shares() path leading to tg_shares_up(), the calculation of
per-cpu cfs_rq shares is rather erratic even under moderate task wake up
rate. The problem is that the per-cpu tg->cfs_rq load weight used in the
sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares
are collected at different time. Under moderate system load, we've seen
quite a bit of variation on the cfs_rq->shares and ultimately wildly
affects sched_entity's load weight.

This patch caches the result of initial per-cpu load weight when doing the
sum calculation, and then pass it down to update_group_shares_cpu() for
redistributing per-cpu cfs_rq shares. This allows consistent total cfs_rq
shares across all CPUs. It also simplifies the rounding and zero load
weight check.

Signed-off-by: Ken Chen <kenchen@xxxxxxxxxx>
Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>

======================================================================
% CPU time division b/n groups
Group 2.6.29-rc1 2.6.29-rc1 w/o the above patch
======================================================================
a with 8 tasks 44 31
b with 5 tasks 32 34
c with 3 tasks 22 34
======================================================================
All groups had equal shares.

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/