Re: [PATCH] sched/fair: update_curr changed sum_exec_runtime to 1 when sum_exec_runtime is 0 beacuse some kernel code use sum_exec_runtime==0 to test task just be forked.

From: Peter Zijlstra
Date: Tue Aug 27 2019 - 10:36:52 EST


On Mon, Aug 26, 2019 at 07:46:50PM +0800, QiaoChong wrote:
> From: Chong Qiao <qiaochong@xxxxxxxxxxx>
>
> Such as:
> cpu_cgroup_attach>
> sched_move_task>
> task_change_group_fair>
> task_move_group_fair>
> detach_task_cfs_rq>
> vruntime_normalized>
>
> /*
> * When !on_rq, vruntime of the task has usually NOT been normalized.
> * But there are some cases where it has already been normalized:
> *
> * - A forked child which is waiting for being woken up by
> * wake_up_new_task().
> * - A task which has been woken up by try_to_wake_up() and
> * waiting for actually being woken up by sched_ttwu_pending().
> */
> if (!se->sum_exec_runtime ||
> (p->state == TASK_WAKING && p->sched_remote_wakeup))
> return true;
>
> p->se.sum_exec_runtime is 0, does not mean task not been run (A forked child which is waiting for being woken up by wake_up_new_task()).
>
> Task may have been scheduled multimes, but p->se.sum_exec_runtime is still 0, because delta_exec maybe 0 in update_curr.
>
> static void update_curr(struct cfs_rq *cfs_rq)
> {
> ...
> delta_exec = now - curr->exec_start;
> if (unlikely((s64)delta_exec <= 0))
> return;
> ...
>
> curr->sum_exec_runtime += delta_exec;
> ...
> }
>
> Task has been run and is stopped(on_rq == 0), vruntime not been normalized, but se->sum_exec_runtime == 0.
> This cause vruntime_normalized set on_rq 1, and does not normalize vruntime.
> This may cause task use old vruntime in old cgroup, which maybe very large than task's vruntime in new cgroup.
> Which may cause task may not scheduled in run queue for long time after been waked up.
>
> Now I change sum_exec_runtime to 1 when sum_exec_runtime == 0 in update_curr to make sun_exec_runtime not 0.

Have you actually observed this? It is very hard to have a 0 delta
between two scheduling events.