Re: [RFC][PATCH] sched: attach extra runtime to the right avg
From: Josef Bacik
Date: Tue Jul 04 2017 - 08:48:29 EST
On Tue, Jul 04, 2017 at 02:40:03PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote:
> > >
> > > This code on the other hand:
> > >
> > > sa->last_update_time += delta << 10;
> > >
> > > ... in essence creates a whole new absolute clock value that slowly but surely is
> > > drifting away from the real rq->clock, because 'delta' is always rounded down to
> > > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses.
> > >
> > > That is because:
> > >
> > > delta >>= 10;
> > > ...
> > > sa->last_update_time += delta << 10;
> > >
> > > Given enough time, ->last_update_time can drift a long way, and this delta:
> > >
> > > delta = now - sa->last_update_time;
> > >
> > > ... becomes meaningless AFAICS, because it's essentially two different clocks that
> > > get compared.
> >
> > Thing is, once you drift over 1023 (ns) your delta increases and you
> > catch up again.
> >
> >
> >
> > A B C D E F
> > | | | | | |
> > +----+----+----+----+----+----+----+----+----+----+----+
> >
> >
> > A: now = 0
> > sa->last_update_time = 0
> > delta := (now - sa->last_update_time) >> 10 = 0
> >
> > B: now = 614 (+614)
> > delta = (614 - 0) >> 10 = 0
> > sa->last_update_time += 0 (0)
> > sa->last_update_time = now & ~1023 (0)
> >
> > C: now = 1843 (+1229)
> > delta = (1843 - 0) >> 10 = 1
> > sa->last_update_time += 1024 (1024)
> > sa->last_update_time = now & ~1023 (1024)
> >
> >
> > D: now = 3481 (+1638)
> > delta = (3481 - 1024) >> 10 = 2
> > sa->last_update_time += 2048 (3072)
> > sa->last_update_time = now & ~1023 (3072)
> >
> > E: now = 5734 (+2253)
> > delta = (5734 - 3072) = 2
> > sa->last_update_time += 2048 (5120)
> > sa->last_update_time = now & ~1023 (5120)
> >
> > F: now = 6348 (+614)
> > delta = (6348 - 5120) >> 10 = 1
> > sa->last_update_time += 1024 (6144)
> > sa->last_update_time = now & ~1023 (6144)
> >
> >
> >
> > And you'll see that both are identical, and that both D and F have
> > gotten a spill from sub-chunk accounting.
>
>
> Where the two approaches differ is when we have different modifications
> to sa->last_update_time (and we do).
>
> The differential (+=) one does not mandate initial value of
> ->last_update_time has the bottom 9 bits cleared. It will simply
> continue from wherever.
>
> The absolute (&) one however mandates that ->last_update_time always has
> the bottom few bits 0, otherwise we can 'gain' time. The first iteration
> will clear those bits and we'll then double account them.
>
> It so happens that we have an explicit assign in migrate
> (attach_entity_load_avg / set_task_rq_fair). And on negative delta. In
> all those cases we use the immediate 'now' value, no clearing of bottom
> bits.
>
> The differential should work fine with that, the absolute one has double
> accounting issues in that case.
>
> So it would be very good to find what exactly causes Josef's workload to
> get 'fixed'.
Sorry let me experiment some more, like I said this is one of 4 patches I need
to actually fix my workload, and I've tested so many iterations of this problem
that I may be _thinking_ it affects things but it really doesn't. I'll re-test
with the normal code and the other 3 patches in place and see if things are ok.
Thanks,
Josef