Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util_avg

From: Yuyang Du
Date: Tue Apr 12 2016 - 00:20:08 EST


Hi Vincent,

On Mon, Apr 11, 2016 at 02:29:25PM +0200, Vincent Guittot wrote:
>
> > update any group entity's util, so the net overhead should not be
> > formidable. In addition, rq's util updates can be very frequent,
> > but the updates in a period will be skipped, this is mostly effective
> > in update_blocked_averages().
>
> Not sure to catch your point here but the rq->util of an idle CPU will
> never be updated before a (idle) load balance as we call
> update_cfs_rq_load_avg which doesn't update the cfs_rq->avg.util_avg
> anymore nor rq->avg.util_avg.

I meant in update_blocked_averages(), the rq's util won't be updated
every time a cfs is updated if they are in the same period (~1ms), probabaly
this is the case.

The idle CPU thing is no difference, so it is anyway the responsibility of
any other active CPU to take the idle CPU's idle time into account for any
purpose.

>
> > + if (update_util)
> > + sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
> > +
> > + if (!cfs_rq)
> > + return 1;
> > +
> > + /*
> > + * Update rq's util_sum and util_avg
> > + */
>
> Do we really have to update the utilization of rq each time we update
> a sched_entity ? IMHO, we don't need to do this update so often even
> more if the se is a task_group. But even if it's a task, we don't
> especially need to update it right now but we can wait for the update
> of the rq->cfs like previously or we explicilty update it when we have
> to do a direct sub/add on the util_avg value
> See also my comment on removed_util_avg below
>

No, we only update a rq's util if the update is for cfs, not the sched_entity,
which is the if (!cfs_rq) condition's job

> Why not using the sched_avg of the rq->cfs in order to track the
> utilization of the root cfs_rq instead of adding a new sched_avg into
> the rq ? Then you call update_cfs_rq_load_avg(rq->cfs) when you want
> to update/sync the utilization of the rq->cfs and for one call you
> will update both the load_avg and the util_avg of the root cfs instead
> of duplicating the sequence in _update_load_avg

This is the approach taken by Dietmar in his patch, a fairly easy approach.
The problem is though, if so, we update the root cfs_rq only when it is
the root cfs_rq to update. A simple contrived case will make it never
updated except in update_blocked_averages(). My impression is that this
might be too much precision lost.

And thus we take this alternative approach, and thus I revisited
__update_load_avg() to optimize it.

[snip]

> > - if (atomic_long_read(&cfs_rq->removed_util_avg)) {
> > - long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
> > - sa->util_avg = max_t(long, sa->util_avg - r, 0);
> > - sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
> > + if (atomic_long_read(&rq->removed_util_avg)) {
> > + long r = atomic_long_xchg(&rq->removed_util_avg, 0);
> > + rq->avg.util_avg = max_t(long, rq->avg.util_avg - r, 0);
> > + rq->avg.util_sum = max_t(s32, rq->avg.util_sum - r * LOAD_AVG_MAX, 0);
>
> I see one potential issue here because the rq->util_avg may (surely)
> have been already updated and decayed during the update of a
> sched_entity but before we substract the removed_util_avg

This is the same now, because cfs_rq will be regularly updated in
update_blocked_averages(), which basically means cfs_rq will be newer
than task for sure, although task tries to catch up when removed.
Well, yes, in this patch with rq, the situation is more so. But,
hopefully, this is a formidable concern.