On Mon, Mar 20, 2023 at 4:55 AM Hao Jia <jiahao.os@xxxxxxxxxxxxx> wrote:
kindly ping...
On 2023/3/8 Hao Jia wrote:
If core-sched is enabled, sometimes we will traverse each CPU on the core
to find the highest priority task 'max' on the entire core, and then try
and find a runnable task that matches @max.
But in the following case, we choose the runnable task is not the best.
core max: task2 (cookie 0)
rq0 rq1
task0(cookie non-zero) task2(cookie 0)
task1(cookie 0)
task3(cookie 0)
...
pick-task: idle pick-task: task2
CPU0 and CPU1 are two CPUs on the same core, task0 and task2 are the
highest priority tasks on rq0 and rq1 respectively, task2 is @max
on the entire core.
In the case that 'max' has a zero cookie, instead of continuing to
search for a runnable task on rq0 that matches @max's cookie, we
choose idle for rq0 directly.
At this time, it is obviously better to choose task1 to run for rq0,
which will increase the CPU utilization.
Therefore, we queue tasks with zero cookies in core_tree, and record
the number of non-zero cookie tasks of each rq to detect the status
of the sched-core.
I do remember this as a known issue (more of a known but unimplemented
optimization) which happens when you have a high priority non-cookie
task which is in front of several low priority ones on the same
thread/rq. Adding +Vineeth Pillai to see if he remembers the issue.
I can try to take a look at it this week to make sense of your patch.
The code in upstream has changed quite a bit since we did this work on
the older kernels, so allow some time to page it all in.
Meanwhile, could you please provide some more details of your
usecase/workload and how this patch improves it? Also out of
curiosity, is bytedance using core scheduling for security or
something else?
Thanks,
- Joel
Signed-off-by: Hao Jia <jiahao.os@xxxxxxxxxxxxx>
---
kernel/sched/core.c | 29 +++++++++++++++--------------
kernel/sched/core_sched.c | 9 ++++-----
kernel/sched/sched.h | 1 +
3 files changed, 20 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index af017e038b48..765cd14c52e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -236,8 +236,8 @@ void sched_core_enqueue(struct rq *rq, struct task_struct *p)
{
rq->core->core_task_seq++;
- if (!p->core_cookie)
- return;
+ if (p->core_cookie)
+ rq->cookied_count++;
rb_add(&p->core_node, &rq->core_tree, rb_sched_core_less);
}
@@ -246,11 +246,16 @@ void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
{
rq->core->core_task_seq++;
+ if (p->core_cookie)
+ rq->cookied_count--;
+
if (sched_core_enqueued(p)) {
rb_erase(&p->core_node, &rq->core_tree);
RB_CLEAR_NODE(&p->core_node);
}
+ if (!sched_core_enabled(rq))
+ return;
/*
* Migrating the last task off the cpu, with the cpu in forced idle
* state. Reschedule to create an accounting edge for forced idle,
@@ -370,7 +375,7 @@ static void sched_core_assert_empty(void)
int cpu;
for_each_possible_cpu(cpu)
- WARN_ON_ONCE(!RB_EMPTY_ROOT(&cpu_rq(cpu)->core_tree));
+ WARN_ON_ONCE(cpu_rq(cpu)->cookied_count);
}
static void __sched_core_enable(void)