Re: [PATCH 2/2] sched: update runqueue clock before migrations away

From: Peter Zijlstra
Date: Tue Dec 17 2013 - 10:52:11 EST


On Tue, Dec 17, 2013 at 02:09:13PM +0000, Chris Redpath wrote:
> On 12/12/13 18:24, Peter Zijlstra wrote:
> >Would pre_schedule_idle() -> rq_last_tick_reset() -> rq->last_sched_tick
> >be useful?
> >
> >I suppose we could easily lift that to NO_HZ_COMMON.
> >
>
> Many thanks for the tip Peter, I have tried this out and it does provide
> enough information to be able to correct the problem. The new version
> doesn't update the rq, just carries the extra unaccounted time (estimated
> from the jiffies) over to be processed during enqueue.
>
> However before I send a new patch set I have a question about the existing
> behavior. Ben, you may already know the answer to this?
>
> During a wake migration we call __synchronize_entity_decay in
> migrate_task_rq_fair, which will decay avg.runnable_avg_sum. We also record
> the amount of periods we decayed for as a negative number in
> avg.decay_count.
>
> We then enqueue the task on its target runqueue, and again we decay the load
> by the number of periods it has been off-rq.
>
> if (unlikely(se->avg.decay_count <= 0)) {
> se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
> if (se->avg.decay_count) {
> se->avg.last_runnable_update -= (-se->avg.decay_count)
> << 20;
> >>> update_entity_load_avg(se, 0);
>
> Am I misunderstanding how this is supposed to work or have we been always
> double-accounting sleep time for wake migrations?

The way I understood this is that on migrate we sync from dequeue to
migrate point, and subtract the 'current (at point of migrate)' blocked
load from the src rq, and add it to the dst rq.

Then on enqueue we sync again, but now from migrate to enqueue.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/