[tip: sched/core] sched: Initialize the vruntime of a new task when it is first enqueued

From: tip-bot2 for Zhang Qiao
Date: Mon Jul 29 2024 - 06:36:51 EST


The following commit has been merged into the sched/core branch of tip:

Commit-ID: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c
Gitweb: https://git.kernel.org/tip/c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c
Author: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
AuthorDate: Thu, 27 Jun 2024 21:33:59 +08:00
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Mon, 29 Jul 2024 12:22:34 +02:00

sched: Initialize the vruntime of a new task when it is first enqueued

When creating a new task, we initialize vruntime of the newly task at
sched_cgroup_fork(). However, the timing of executing this action is too
early and may not be accurate.

Because it uses current CPU to init the vruntime, but the new task
actually runs on the cpu which be assigned at wake_up_new_task().

To optimize this case, we pass ENQUEUE_INITIAL flag to activate_task()
in wake_up_new_task(), in this way, when place_entity is called in
enqueue_entity(), the vruntime of the new task will be initialized.

In addition, place_entity() in task_fork_fair() was introduced for two
reasons:
1. Previously, the __enqueue_entity() was in task_new_fair(),
in order to provide vruntime for enqueueing the newly task, the
vruntime assignment equation "se->vruntime = cfs_rq->min_vruntime" was
introduced by commit e9acbff6484d ("sched: introduce se->vruntime").
This is the initial state of place_entity().

2. commit 4d78e7b656aa ("sched: new task placement for vruntime") added
child_runs_first task placement feature which based on vruntime, this
also requires the new task's vruntime value.

After removing the child_runs_first and enqueue_entity() from
task_fork_fair(), this place_entity() no longer makes sense, so remove
it also.

Signed-off-by: Zhang Qiao <zhangqiao22@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://lkml.kernel.org/r/20240627133359.1370598-1-zhangqiao22@xxxxxxxxxx
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 15 ---------------
2 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f3951e4..2c61b4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4686,7 +4686,7 @@ void wake_up_new_task(struct task_struct *p)
update_rq_clock(rq);
post_init_entity_util_avg(p);

- activate_task(rq, p, ENQUEUE_NOCLOCK);
+ activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p);
wakeup_preempt(rq, p, WF_FORK);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9057584..e8cdfeb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12702,22 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
*/
static void task_fork_fair(struct task_struct *p)
{
- struct sched_entity *se = &p->se, *curr;
- struct cfs_rq *cfs_rq;
- struct rq *rq = this_rq();
- struct rq_flags rf;
-
- rq_lock(rq, &rf);
- update_rq_clock(rq);
-
set_task_max_allowed_capacity(p);
-
- cfs_rq = task_cfs_rq(current);
- curr = cfs_rq->curr;
- if (curr)
- update_curr(cfs_rq);
- place_entity(cfs_rq, se, ENQUEUE_INITIAL);
- rq_unlock(rq, &rf);
}

/*