Re: [patch] sched: fix inconsistency when redistribute per-cputg->cfs_rq shares.

From: Peter Zijlstra
Date: Wed Nov 19 2008 - 15:51:19 EST


On Tue, 2008-11-18 at 22:41 -0800, Ken Chen wrote:

It just occurred to me that you lost the boost stuff.

The issue with the boost flag is that we have a different
tg->cfs_rq[cpu]->shares
than the actual
tg->cfs_se[cpu]->load.weight

So that the starvation shares don't count towards the total distributed
shares.

I guess this is an overestimate vs underestimate issue, you now
underestimate possibly pessimising your actual bandwidth.



> diff --git a/kernel/sched.c b/kernel/sched.c
> index 9b1e793..1ff78b6 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1473,27 +1473,13 @@ static void
> update_group_shares_cpu(struct task_group *tg, int cpu,
> unsigned long sd_shares, unsigned long sd_rq_weight)
> {
> - int boost = 0;
> unsigned long shares;
> unsigned long rq_weight;
>
> if (!tg->se[cpu])
> return;
>
> - rq_weight = tg->cfs_rq[cpu]->load.weight;
> -
> - /*
> - * If there are currently no tasks on the cpu pretend there is one of
> - * average load so that when a new task gets to run here it will not
> - * get delayed by group starvation.
> - */
> - if (!rq_weight) {
> - boost = 1;
> - rq_weight = NICE_0_LOAD;
> - }
> -
> - if (unlikely(rq_weight > sd_rq_weight))
> - rq_weight = sd_rq_weight;
> + rq_weight = tg->cfs_rq[cpu]->rq_weight;
>
> /*
> * \Sum shares * rq_weight
> @@ -1501,7 +1487,7 @@ update_group_shares_cpu
> * \Sum rq_weight
> *
> */
> - shares = (sd_shares * rq_weight) / (sd_rq_weight + 1);
> + shares = (sd_shares * rq_weight) / sd_rq_weight;
> shares = clamp_t(unsigned long, shares, MIN_SHARES, MAX_SHARES);
>
> if (abs(shares - tg->se[cpu]->load.weight) >
> @@ -1510,11 +1496,7 @@ update_group_shares_cpu
> unsigned long flags;
>
> spin_lock_irqsave(&rq->lock, flags);
> - /*
> - * record the actual number of shares, not the boosted amount.
> - */
> - tg->cfs_rq[cpu]->shares = boost ? 0 : shares;
> - tg->cfs_rq[cpu]->rq_weight = rq_weight;
> + tg->cfs_rq[cpu]->shares = shares;
>
> __set_se_shares(tg->se[cpu], shares);
> spin_unlock_irqrestore(&rq->lock, flags);
> @@ -1528,13 +1510,23 @@ update_group_shares_cpu
> */
> static int tg_shares_up(struct task_group *tg, void *data)
> {
> - unsigned long rq_weight = 0;
> + unsigned long weight, rq_weight = 0;
> unsigned long shares = 0;
> struct sched_domain *sd = data;
> int i;
>
> for_each_cpu_mask(i, sd->span) {
> - rq_weight += tg->cfs_rq[i]->load.weight;
> + /*
> + * If there are currently no tasks on the cpu pretend there
> + * is one of average load so that when a new task gets to
> + * run here it will not get delayed by group starvation.
> + */
> + weight = tg->cfs_rq[i]->load.weight;
> + if (!weight)
> + weight = NICE_0_LOAD;
> +
> + tg->cfs_rq[i]->rq_weight = weight;
> + rq_weight += weight;
> shares += tg->cfs_rq[i]->shares;
> }
>
> @@ -1544,9 +1536,6 @@ static int tg_shares_up
> if (!sd->parent || !(sd->parent->flags & SD_LOAD_BALANCE))
> shares = tg->shares;
>
> - if (!rq_weight)
> - rq_weight = cpus_weight(sd->span) * NICE_0_LOAD;
> -
> for_each_cpu_mask(i, sd->span)
> update_group_shares_cpu(tg, i, shares, rq_weight);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/