Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

From: Peter Zijlstra
Date: Wed Sep 28 2016 - 06:14:40 EST


On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8fb4d1942c14..4a2d3ff772f8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> int migrated, decayed;
>
> migrated = !sa->last_update_time;
> - if (!migrated) {
> + if (!migrated && se->sum_exec_runtime) {
> __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> se->on_rq * scale_load_down(se->load.weight),
> cfs_rq->curr == se, NULL);


Hrmm,.. so I see the problem, but I think we're working around it.

So the problem is that time moves between wake_up_new_task() doing
post_init_entity_util_avg(), which attaches us to the cfs_rq, and
activate_task() which enqueues us.

Part of the problem is that we do not in fact seem to do
update_rq_clock() before post_init_entity_util_avg(), which makes the
delta larger than it should be.

The other problem is that activate_task()->enqueue_task() does do
update_rq_clock() (again, after fixing), creating the delta.

Which suggests we do something like the below (not compile tested or
anything, also I ran out of tea again).

While staring at this, I don't think we can still hit
vruntime_normalized() with a new task, so I _think_ we can remove that
!se->sum_exec_runtime clause there (and rejoice), no?


---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7e7463aa399a..cc59bd4ab809 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -754,9 +754,16 @@ static void set_load_weight(struct task_struct *p)

static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
{
- update_rq_clock(rq);
+ /*
+ * For ENQUEUE_RESTORE, DEQUEUE_SAVE will have updated the rq-clock,
+ * for ENQUEUE_NEW wake_up_new_task() will have.
+ */
+ if (!(flags & (ENQUEUE_RESTORE | ENQUEUE_NEW)))
+ update_rq_clock(rq);
+
if (!(flags & ENQUEUE_RESTORE))
sched_info_queued(rq, p);
+
p->sched_class->enqueue_task(rq, p, flags);
}

@@ -2577,9 +2584,11 @@ void wake_up_new_task(struct task_struct *p)
__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
#endif
rq = __task_rq_lock(p, &rf);
+
+ update_rq_clock(rq);
post_init_entity_util_avg(&p->se);
+ activate_task(rq, p, ENQUEUE_NEW);

- activate_task(rq, p, 0);
p->on_rq = TASK_ON_RQ_QUEUED;
trace_sched_wakeup_new(p);
check_preempt_curr(rq, p, WF_FORK);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7c7e5745038b..3982d7dc9bff 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1193,6 +1193,7 @@ extern const u32 sched_prio_to_wmult[40];
#else
#define ENQUEUE_MIGRATED 0x00
#endif
+#define ENQUEUE_NEW 0x40

#define RETRY_TASK ((void *)-1UL)