Re: [PATCH v8 2/4] sched: Rewrite runnable load and utilization average tracking

From: Boqun Feng
Date: Fri Jun 19 2015 - 02:01:06 EST


Hi Yuyang,

On Tue, Jun 16, 2015 at 03:26:05AM +0800, Yuyang Du wrote:
> @@ -5977,36 +5786,6 @@ static void attach_tasks(struct lb_env *env)
> }
>
> #ifdef CONFIG_FAIR_GROUP_SCHED
> -/*
> - * update tg->load_weight by folding this cpu's load_avg
> - */
> -static void __update_blocked_averages_cpu(struct task_group *tg, int cpu)
> -{
> - struct sched_entity *se = tg->se[cpu];
> - struct cfs_rq *cfs_rq = tg->cfs_rq[cpu];
> -
> - /* throttled entities do not contribute to load */
> - if (throttled_hierarchy(cfs_rq))
> - return;
> -
> - update_cfs_rq_blocked_load(cfs_rq, 1);
> -
> - if (se) {
> - update_entity_load_avg(se, 1);
> - /*
> - * We pivot on our runnable average having decayed to zero for
> - * list removal. This generally implies that all our children
> - * have also been removed (modulo rounding error or bandwidth
> - * control); however, such cases are rare and we can fix these
> - * at enqueue.
> - *
> - * TODO: fix up out-of-order children on enqueue.
> - */
> - if (!se->avg.runnable_avg_sum && !cfs_rq->nr_running)
> - list_del_leaf_cfs_rq(cfs_rq);
> - }
> -}
> -
> static void update_blocked_averages(int cpu)
> {
> struct rq *rq = cpu_rq(cpu);
> @@ -6015,17 +5794,17 @@ static void update_blocked_averages(int cpu)
>
> raw_spin_lock_irqsave(&rq->lock, flags);
> update_rq_clock(rq);
> +
> /*
> * Iterates the task_group tree in a bottom up fashion, see
> * list_add_leaf_cfs_rq() for details.
> */
> for_each_leaf_cfs_rq(rq, cfs_rq) {
> - /*
> - * Note: We may want to consider periodically releasing
> - * rq->lock about these updates so that creating many task
> - * groups does not result in continually extending hold time.
> - */
> - __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
> + /* throttled entities do not contribute to load */
> + if (throttled_hierarchy(cfs_rq))
> + continue;
> +
> + update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);

We iterates task_group tree(actually the corresponding cfs_rq tree on
that cpu), because we want to do a bottom-up update. And we want a
bottom-up update, because we want to update the weigthed_cpuload().

__update_blocked_averages_cpu(tg, cpu) does three things:

Let's say:
cfs_rq = tg->cfs_rq[cpu]
se = tg->cfs_rq[cpu]
pcfs_rq = cfs_rq_of(se) ,which is the parent of cfs_rq.

1. update cfs_rq->blocked_load_avg, and its contrib to its task group.
2. update se->avg and calculate the deltas of se->avg.*_avg_contrib.
3. update pcfs_rq->*_load_avg with the deltas in step 2

In this way, __update_blocked_averages_cpu(tg, cpu) can contributes
tg's load changes to its parent. So that update_blocked_averages() can
aggregate all load changes to weighted_cpuload().

However, update_cfs_rq_load_avg() only updates cfs_rq->avg, the change
won't be contributed or aggregated to cfs_rq's parent in the
for_each_leaf_cfs_rq loop, therefore that's actually not a bottom-up
update.

To fix this, I think we can add a update_cfs_shares(cfs_rq) after
update_cfs_rq_load_avg(). Like:

for_each_leaf_cfs_rq(rq, cfs_rq) {
- /*
- * Note: We may want to consider periodically releasing
- * rq->lock about these updates so that creating many task
- * groups does not result in continually extending hold time.
- */
- __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
+ /* throttled entities do not contribute to load */
+ if (throttled_hierarchy(cfs_rq))
+ continue;
+
+ update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+ update_cfs_share(cfs_rq);
}

However, I think update_cfs_share isn't cheap, because it may do a
bottom-up update once called. So how about just update the root cfs_rq?
Like:

- /*
- * Iterates the task_group tree in a bottom up fashion, see
- * list_add_leaf_cfs_rq() for details.
- */
- for_each_leaf_cfs_rq(rq, cfs_rq) {
- /*
- * Note: We may want to consider periodically releasing
- * rq->lock about these updates so that creating many task
- * groups does not result in continually extending hold time.
- */
- __update_blocked_averages_cpu(cfs_rq->tg, rq->cpu);
- }
+ update_cfs_rq_load_avg(rq_clock_task(rq), rq->cfs_rq);

Thanks and Best Regards,
Boqun

Attachment: signature.asc
Description: PGP signature