Re: [PATCH 4/6] sched_ext: bypass mode shouldn't depend on ops.select_cpu()

From: David Vernet
Date: Thu Oct 10 2024 - 14:15:31 EST


On Wed, Oct 09, 2024 at 11:41:00AM -1000, Tejun Heo wrote:

Hi Tejun,

> Bypass mode was depending on ops.select_cpu() which can't be trusted as with
> the rest of the BPF scheduler. Always enable and use scx_select_cpu_dfl() in
> bypass mode.

Could you please clarify why we can't trust ops.select_cpu()? Even if it
returns a bogus, offline, etc, CPU, shouldn't core.c take care of
finding a valid CPU for us in select_fallback_rq()?

Assuming we really do require a valid CPU here in bypass mode, do we
need to reset the state of the idle masks for the case of
!scx_builtin_idle_enabled? The masks won't necessarily reflect the set
of online CPUs if we haven't been updating it, right?

Thanks,
David

>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> kernel/sched/ext.c | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 1cae18000de1..c5cda7368de5 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3126,7 +3126,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag
> if (unlikely(wake_flags & WF_EXEC))
> return prev_cpu;
>
> - if (SCX_HAS_OP(select_cpu)) {
> + if (SCX_HAS_OP(select_cpu) && !scx_rq_bypassing(task_rq(p))) {
> s32 cpu;
> struct task_struct **ddsp_taskp;
>
> @@ -3191,7 +3191,7 @@ void __scx_update_idle(struct rq *rq, bool idle)
> {
> int cpu = cpu_of(rq);
>
> - if (SCX_HAS_OP(update_idle)) {
> + if (SCX_HAS_OP(update_idle) && !scx_rq_bypassing(rq)) {
> SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle);
> if (!static_branch_unlikely(&scx_builtin_idle_enabled))
> return;
> @@ -4254,21 +4254,23 @@ bool task_should_scx(struct task_struct *p)
> * the DISABLING state and then cycling the queued tasks through dequeue/enqueue
> * to force global FIFO scheduling.
> *
> - * a. ops.enqueue() is ignored and tasks are queued in simple global FIFO order.
> - * %SCX_OPS_ENQ_LAST is also ignored.
> + * - ops.select_cpu() is ignored and the default select_cpu() is used.
> *
> - * b. ops.dispatch() is ignored.
> + * - ops.enqueue() is ignored and tasks are queued in simple global FIFO order.
> + * %SCX_OPS_ENQ_LAST is also ignored.
> *
> - * c. balance_scx() does not set %SCX_RQ_BAL_KEEP on non-zero slice as slice
> - * can't be trusted. Whenever a tick triggers, the running task is rotated to
> - * the tail of the queue with core_sched_at touched.
> + * - ops.dispatch() is ignored.
> *
> - * d. pick_next_task() suppresses zero slice warning.
> + * - balance_scx() does not set %SCX_RQ_BAL_KEEP on non-zero slice as slice
> + * can't be trusted. Whenever a tick triggers, the running task is rotated to
> + * the tail of the queue with core_sched_at touched.
> *
> - * e. scx_bpf_kick_cpu() is disabled to avoid irq_work malfunction during PM
> - * operations.
> + * - pick_next_task() suppresses zero slice warning.
> *
> - * f. scx_prio_less() reverts to the default core_sched_at order.
> + * - scx_bpf_kick_cpu() is disabled to avoid irq_work malfunction during PM
> + * operations.
> + *
> + * - scx_prio_less() reverts to the default core_sched_at order.
> */
> static void scx_ops_bypass(bool bypass)
> {
> @@ -4338,7 +4340,7 @@ static void scx_ops_bypass(bool bypass)
>
> rq_unlock_irqrestore(rq, &rf);
>
> - /* kick to restore ticks */
> + /* resched to restore ticks and idle state */
> resched_cpu(cpu);
> }
> }
> --
> 2.46.2
>

Attachment: signature.asc
Description: PGP signature