Re: [PATCH 4/4] sched,fair: Fix PELT integrity for new tasks

From: Peter Zijlstra
Date: Fri Jun 24 2016 - 09:03:36 EST


Sorry, I only spotted your reply yesterday.

On Tue, Jun 21, 2016 at 12:51:39PM +0800, Yuyang Du wrote:
> On Tue, Jun 21, 2016 at 10:41:19AM +0200, Peter Zijlstra wrote:

> > The things we ran into with these patches were that:
> >
> > 1) You need to update the cfs_rq _before_ any entity attach/detach
> > (and might need to update_tg_load_avg when update_cfs_rq_load_avg()
> > returns true).
>
> This is intrinsically an additional update, not a fix to anything. I
> don't think it is a must, but I am fine with it.

> > Esp. 1 is important, because while for mathematically consistency you
> > don't actually need to do this, you only need the entities to be
> > up-to-date with the cfs rq when you attach/detach, but that forgets the
> > temporal aspect of _when_ you do this.
>
> Yes, temporally at any instant the avgs are outdated. But, I can have it,
> and what if I have it?

So I see your point; but there's a big difference between 'instant' and
10ms (HZ=100). So by aging the cfs_rq to the instant we fix two issues:

- that it can be up to 10ms stale
- that is can be 'uninitialized' at all

It also makes code consistent, all other sites also do this.

> > 3) cpu migration is the only exception and uses the last_update_time=0
> > thing -- because refusal to take second rq->lock.
>
> Task's last_update_time means this task is detached from fair queue. This
> (re)definition is by all means much better than migrating. No?

I would maybe redefine it as an up-to-date marker for a migration across
a clock discontinuity. Both CPU migration and group movement suffer from
this, albeit for different reasons.

In the CPU migration case we simply cannot tell time by our refusal to
acquire the old rq lock. So we age to the last time we 'know' and then
mark it up-to-date.

For the cgroup move the timelines simply _are_ discontinuous. So we have
to mark it up-to-date after we update it to the instant of detach, such
that when we attach it to the new group we don't try to age it across
the time difference.

> > Which is why I dislike Yuyang's patches, they create more exceptions
> > instead of applying existing rules (albeit undocumented).
> >
>
> I am thinking about document this really well, like "An art of load tracking:
> accuracy, overhead, and usefulness", seriously.

Any attempt to document all this would be greatly appreciated, although
I would like it to be in comments in the fair.c file itself if possible.