Re: [External] Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable

From: Hao Jia
Date: Tue Mar 21 2023 - 23:12:54 EST

Next message: Jiapeng Chong: "[PATCH] drm/amd/display: Clean up some inconsistent indenting"
Previous message: Jakub Kicinski: "Re: [PATCHv1 1/2] net: ethernet: stmmac: dwmac-rk: fix optional clock handling"
In reply to: Joel Fernandes: "Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable"
Next in thread: Vineeth Pillai: "Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2023/3/22 Joel Fernandes wrote:

On Mon, Mar 20, 2023 at 4:55 AM Hao Jia <jiahao.os@xxxxxxxxxxxxx> wrote:

kindly ping...

On 2023/3/8 Hao Jia wrote:

If core-sched is enabled, sometimes we will traverse each CPU on the core
to find the highest priority task 'max' on the entire core, and then try
and find a runnable task that matches @max.
But in the following case, we choose the runnable task is not the best.

core max: task2 (cookie 0)

rq0 rq1
task0(cookie non-zero) task2(cookie 0)
task1(cookie 0)
task3(cookie 0)
...

pick-task: idle pick-task: task2

CPU0 and CPU1 are two CPUs on the same core, task0 and task2 are the
highest priority tasks on rq0 and rq1 respectively, task2 is @max
on the entire core.

In the case that 'max' has a zero cookie, instead of continuing to
search for a runnable task on rq0 that matches @max's cookie, we
choose idle for rq0 directly.
At this time, it is obviously better to choose task1 to run for rq0,
which will increase the CPU utilization.
Therefore, we queue tasks with zero cookies in core_tree, and record
the number of non-zero cookie tasks of each rq to detect the status
of the sched-core.

I do remember this as a known issue (more of a known but unimplemented
optimization) which happens when you have a high priority non-cookie
task which is in front of several low priority ones on the same
thread/rq. Adding +Vineeth Pillai to see if he remembers the issue.

Thank you for sharing the information, I will check the previous emails.

I can try to take a look at it this week to make sense of your patch.
The code in upstream has changed quite a bit since we did this work on
the older kernels, so allow some time to page it all in.

My patch tries to make non-cookie tasks also queue the core_tree if CONFIG_SCHED_CORE is defined.
This allows us to quaickly find the highest priority non-cookie task on this rq through sched_core_find(). But as Peter said it makes the overhead of the second rb tree unconditional.

Finding the highest priority non-cookie task is complex and time-consuming, especially in CONFIG_FAIR_GROUP_SCHED.

I am now trying to let the fore_idle CPU look for a highest priority non-cookie task through a mechanism similar to queue_balance_callback.
I'm not sure if this is possible, do you have a better suggestion?

Meanwhile, could you please provide some more details of your
usecase/workload and how this patch improves it? Also out of
curiosity, is bytedance using core scheduling for security or
something else?

At ByteDance, we try to use core-sched to deploy high-priority services (online) and low-priority services (offline) on the same physical machine.

core-sched can ensure that offline(cookie non-zero) will not affect onlie's L1 and L2 cache when onlie is running. In our scenario, core-sched is very useful for online services.

Thanks,
Hao

Thanks,

- Joel

Signed-off-by: Hao Jia <jiahao.os@xxxxxxxxxxxxx>
---
kernel/sched/core.c | 29 +++++++++++++++--------------
kernel/sched/core_sched.c | 9 ++++-----
kernel/sched/sched.h | 1 +
3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index af017e038b48..765cd14c52e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -236,8 +236,8 @@ void sched_core_enqueue(struct rq *rq, struct task_struct *p)
{
rq->core->core_task_seq++;

- if (!p->core_cookie)
- return;
+ if (p->core_cookie)
+ rq->cookied_count++;

rb_add(&p->core_node, &rq->core_tree, rb_sched_core_less);
}
@@ -246,11 +246,16 @@ void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags)
{
rq->core->core_task_seq++;

+ if (p->core_cookie)
+ rq->cookied_count--;
+
if (sched_core_enqueued(p)) {
rb_erase(&p->core_node, &rq->core_tree);
RB_CLEAR_NODE(&p->core_node);
}

+ if (!sched_core_enabled(rq))
+ return;
/*
* Migrating the last task off the cpu, with the cpu in forced idle
* state. Reschedule to create an accounting edge for forced idle,
@@ -370,7 +375,7 @@ static void sched_core_assert_empty(void)
int cpu;

for_each_possible_cpu(cpu)
- WARN_ON_ONCE(!RB_EMPTY_ROOT(&cpu_rq(cpu)->core_tree));
+ WARN_ON_ONCE(cpu_rq(cpu)->cookied_count);
}

static void __sched_core_enable(void)

Next message: Jiapeng Chong: "[PATCH] drm/amd/display: Clean up some inconsistent indenting"
Previous message: Jakub Kicinski: "Re: [PATCHv1 1/2] net: ethernet: stmmac: dwmac-rk: fix optional clock handling"
In reply to: Joel Fernandes: "Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable"
Next in thread: Vineeth Pillai: "Re: [PATCH] sched/core: Minor optimize pick_next_task() when core-sched enable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]