Re: [PATCH 2/6] sched: add WF_CURRENT_CPU and externise ttwu

From: Chen Yu
Date: Fri Apr 07 2023 - 23:21:16 EST


On 2023-03-07 at 23:31:57 -0800, Andrei Vagin wrote:
> From: Peter Oskolkov <posk@xxxxxxxxxx>
>
> Add WF_CURRENT_CPU wake flag that advices the scheduler to
> move the wakee to the current CPU. This is useful for fast on-CPU
> context switching use cases.
>
> In addition, make ttwu external rather than static so that
> the flag could be passed to it from outside of sched/core.c.
>
> Signed-off-by: Peter Oskolkov <posk@xxxxxxxxxx>
> Signed-off-by: Andrei Vagin <avagin@xxxxxxxxxx>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7569,6 +7569,10 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> if (wake_flags & WF_TTWU) {
> record_wakee(p);
>
> + if ((wake_flags & WF_CURRENT_CPU) &&
> + cpumask_test_cpu(cpu, p->cpus_ptr))
> + return cpu;
> +
I tried to reuse WF_CURRENT_CPU to mitigate the cross-cpu wakeup, however there
are regressions when running some workloads, and these workloads want to be
spreaded on idle CPUs whenever possible.
The reason for the regression is that, above change chooses current CPU no matter
what the load/utilization of this CPU is. So task are stacked on 1 CPU and hurts
throughput/latency. And I believe this issue would be more severe on system with
smaller number of CPU within 1 LLC(when compared to Intel platforms), such as AMD,
Arm64.

I know WF_CURRENT_CPU benefits seccomp, and can we make this change more genefic
to benefit other workloads, by making the condition to trigger WF_CURRENT_CPU stricter?
Say, only current CPU has 1 runnable task, and treat current CPU as the last resort by
checking if the wakee's previous CPU is not idle. In this way, we can enable WF_CURRENT_CPU flag
dynamically when some condition is met(a short task for example).

Thanks,
Chenyu