Re: [PATCH sched_ext/for-7.1] tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
From: Cheng-Yang Chou
Date: Mon Apr 13 2026 - 01:39:32 EST
Hi Tejun,
On Sun, Apr 12, 2026 at 05:30:52PM -1000, Tejun Heo wrote:
> scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's
> ops.dispatch() can pop from. When a CPU pops a task that can't run on it
> (e.g. a pinned per-CPU kthread), it inserts the task into SHARED_DSQ.
> consume_dispatch_q() then skips the task due to affinity mismatch, leaving it
> stranded until some CPU in its allowed mask calls ops.dispatch(). This doesn't
> cause indefinite stalls -- the periodic tick keeps firing (can_stop_idle_tick()
> returns false when softirq is pending) -- but can cause noticeable scheduling
> delays.
>
> After inserting to SHARED_DSQ, kick the task's home CPU if this CPU can't run
> it. There's a small race window where the home CPU can enter idle before the
> kick lands -- if a per-CPU kthread like ksoftirqd is the stranded task, this
> can trigger a "NOHZ tick-stop error" warning. The kick arrives shortly after
> and the home CPU drains the task.
>
> Rather than fully eliminating the warning by routing pinned tasks to local or
> global DSQs, the current code keeps them going through the normal BPF queue
> path and documents the race and the resulting warning in detail. scx_qmap is an
> example scheduler and having tasks go through the usual dispatch path is useful
> for testing. The detailed comment also serves as a reference for other
> schedulers that may encounter similar warnings.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> v2: Replaced the previous enqueue-side fix which kicked when a pinned task was
> enqueued. That was based on the theory that ops.select_cpu() being skipped
> meant the home CPU wouldn't be woken, which wasn't quite right --
> wakeup_preempt() kicks the target CPU regardless. Moved the fix to
> ops.dispatch() where the stranding is actually observable.
>
> tools/sched_ext/scx_qmap.bpf.c | 40 ++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c
> index f3587fb709c9..a4543c7ab25d 100644
> --- a/tools/sched_ext/scx_qmap.bpf.c
> +++ b/tools/sched_ext/scx_qmap.bpf.c
> @@ -471,6 +471,46 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev)
> __sync_fetch_and_add(&nr_dispatched, 1);
>
> scx_bpf_dsq_insert(p, SHARED_DSQ, slice_ns, 0);
> +
> + /*
> + * scx_qmap uses a global BPF queue that any CPU's
> + * dispatch can pop from. If this CPU popped a task that
> + * can't run here, it gets stranded on SHARED_DSQ after
> + * consume_dispatch_q() skips it. Kick the task's home
> + * CPU so it drains SHARED_DSQ.
> + *
> + * There's a race between the pop and the flush of the
> + * buffered dsq_insert:
> + *
> + * CPU 0 (dispatching) CPU 1 (home, idle)
> + * ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
> + * pop from BPF queue
> + * dsq_insert(buffered)
> + * balance:
> + * SHARED_DSQ empty
> + * BPF queue empty
> + * -> goes idle
> + * flush -> on SHARED
> + * kick CPU 1
> + * wakes, drains task
> + *
> + * The kick prevents indefinite stalls but a per-CPU
> + * kthread like ksoftirqd can be briefly stranded when
> + * its home CPU enters idle with softirq pending,
> + * triggering:
> + *
> + * "NOHZ tick-stop error: local softirq work is pending, handler #N!!!"
> + *
> + * from report_idle_softirq(). The kick lands shortly
> + * after and the home CPU drains the task. This could be
> + * avoided by e.g. dispatching pinned tasks to local or
> + * global DSQs, but the current code is left as-is to
> + * document this class of issue -- other schedulers
> + * seeing similar warnings can use this as a reference.
> + */
> + if (!bpf_cpumask_test_cpu(cpu, p->cpus_ptr))
> + scx_bpf_kick_cpu(scx_bpf_task_cpu(p), 0);
> +
> bpf_task_release(p);
>
> batch--;
> --
> 2.53.0
This makes sense.
I also realized my previous patch for scx_userland was unnecessary, as
the global DSQ logic handles this automatically. Sorry for the nose on
that one.
Reviewed-by: Cheng-Yang Chou <yphbchou0911@xxxxxxxxx>
--
Thanks,
Cheng-Yang