Re: [PATCH 4/4] sched,fair: Fix PELT integrity for new tasks

From: Vincent Guittot
Date: Fri Jun 17 2016 - 10:09:27 EST


On 17 June 2016 at 14:01, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> Vincent and Yuyang found another few scenarios in which entity
> tracking goes wobbly.
>
> The scenarios are basically due to the fact that new tasks are not
> immediately attached and thereby differ from the normal situation -- a
> task is always attached to a cfs_rq load average (such that it
> includes its blocked contribution) and are explicitly
> detached/attached on migration to another cfs_rq.
>
> Scenario 1: switch to fair class
>
> p->sched_class = fair_class;
> if (queued)
> enqueue_task(p);
> ...
> enqueue_entity()
> enqueue_entity_load_avg()
> migrated = !sa->last_update_time (true)
> if (migrated)
> attach_entity_load_avg()
> check_class_changed()
> switched_from() (!fair)
> switched_to() (fair)
> switched_to_fair()
> attach_entity_load_avg()
>
> If @p is a new task that hasn't been fair before, it will have
> !last_update_time and, per the above, end up in
> attach_entity_load_avg() _twice_.
>
> Scenario 2: change between cgroups
>
> sched_move_group(p)
> if (queued)
> dequeue_task()
> task_move_group_fair()
> detach_task_cfs_rq()
> detach_entity_load_avg()
> set_task_rq()
> attach_task_cfs_rq()
> attach_entity_load_avg()
> if (queued)
> enqueue_task();
> ...
> enqueue_entity()
> enqueue_entity_load_avg()
> migrated = !sa->last_update_time (true)
> if (migrated)
> attach_entity_load_avg()
>
> Similar as with scenario 1, if @p is a new task, it will have
> !load_update_time and we'll end up in attach_entity_load_avg()
> _twice_.
>
> Furthermore, notice how we do a detach_entity_load_avg() on something
> that wasn't attached to begin with.
>
> As stated above; the problem is that the new task isn't yet attached
> to the load tracking and thereby violates the invariant assumption.
>
> This patch remedies this by ensuring a new task is indeed properly
> attached to the load tracking on creation, through
> post_init_entity_util_avg().
>
> Of course, this isn't entirely as straight forward as one might think,
> since the task is hashed before we call wake_up_new_task() and thus
> can be poked at. We avoid this by adding TASK_NEW and teaching
> cpu_cgroup_can_attach() to refuse such tasks.
>
> Cc: Yuyang Du <yuyang.du@xxxxxxxxx>
> Reported-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> ---
...
>
> +static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
> +static int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq);
> +static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se);
> +
> /*
> * With new tasks being created, their initial util_avgs are extrapolated
> * based on the cfs_rq's current util_avg:
> @@ -733,18 +737,21 @@ void post_init_entity_util_avg(struct sc
> }
> sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
> }
> +
> + update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, false);
> + attach_entity_load_avg(cfs_rq, se);

A new RT task will be attached and will contribute to the load until
it decays to 0
Should we detach it for non cfs task ? We just want to update
last_update_time of RT task to something different from 0

> }
>
> static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
> static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
> -#else
> +#else /* !CONFIG_SMP */
> void init_entity_runnable_average(struct sched_entity *se)
> {
> }
> void post_init_entity_util_avg(struct sched_entity *se)
> {
> }
> -#endif
> +#endif /* CONFIG_SMP */
>
> /*
> * Update the current task's runtime statistics.
> @@ -2847,8 +2854,6 @@ void set_task_rq_fair(struct sched_entit
> static inline void update_tg_load_avg(struct cfs_rq *cfs_rq, int force) {}
> #endif /* CONFIG_FAIR_GROUP_SCHED */
>
> -static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
> -
> static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
> {
> struct rq *rq = rq_of(cfs_rq);
> @@ -2958,6 +2963,8 @@ static void attach_entity_load_avg(struc
> /*
> * If we got migrated (either between CPUs or between cgroups) we'll
> * have aged the average right before clearing @last_update_time.
> + *
> + * Or we're fresh through post_init_entity_util_avg().
> */
> if (se->avg.last_update_time) {
> __update_load_avg(cfs_rq->avg.last_update_time, cpu_of(rq_of(cfs_rq)),
> @@ -3063,11 +3070,14 @@ void remove_entity_load_avg(struct sched
> u64 last_update_time;
>
> /*
> - * Newly created task or never used group entity should not be removed
> - * from its (source) cfs_rq
> + * tasks cannot exit without having gone through wake_up_new_task() ->
> + * post_init_entity_util_avg() which will have added things to the
> + * cfs_rq, so we can remove unconditionally.
> + *
> + * Similarly for groups, they will have passed through
> + * post_init_entity_util_avg() before unregister_sched_fair_group()
> + * calls this.
> */
> - if (se->avg.last_update_time == 0)
> - return;
>
> last_update_time = cfs_rq_last_update_time(cfs_rq);
>
>
>