Re: [PATCH sched_ext/for-7.1] sched_ext: Reduce DSQ lock contention in consume_dispatch_q()

Next message: Tejun Heo: "Re: [PATCH v5 0/2] sched_ext: Update demo schedulers and selftests for deprecated APIs"
Previous message: Krzysztof Kozlowski: "Re: [PATCH 7/8] dt-bindings: i2c: realtek,rtl9301-i2c: extend for RTL9607C support"
In reply to: Andrea Righi: "[PATCH sched_ext/for-7.1] sched_ext: Reduce DSQ lock contention in consume_dispatch_q()"
Next in thread: Andrea Righi: "Re: [PATCH sched_ext/for-7.1] sched_ext: Reduce DSQ lock contention in consume_dispatch_q()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Tejun Heo

Date: Sun Mar 15 2026 - 04:58:14 EST

Hello, Andrea.

On Sun, Mar 15, 2026 at 12:52:31AM +0100, Andrea Righi wrote:
...
> Benchmarks that generate many enqueue/dispatch events (e.g., schbench)
> show around 2-3x higher throughput with most of the scx schedulers with
> this change applied.

Can you share more details about the benchmark setup and results?

> + /*
> + * Use trylock to avoid spinning on a contended DSQ, if we fail to
> + * acquire the lock kick the CPU to retry on the next balance.
> + *
> + * In bypass mode simply spin to acquire the lock, since
> + * scx_kick_cpu() is suppressed.
> + */
> + if (scx_bypassing(sch, cpu)) {
> + raw_spin_lock(&dsq->lock);
> + } else if (!raw_spin_trylock(&dsq->lock)) {
> + scx_kick_cpu(sch, cpu, 0);
> + return false;
> + }

But I'm not sure this is what we wanna do. If we *really* want to do this,
maybe we can add a try_move variant; however, I'm pretty deeply skeptical
about the approach for a few reasons.

- If a shared DSQ becomes a bottleneck, the right thing to do would be
introducing multiple DSQs and shard them.

- This likely is trading off fairness to gain bandwidth and this approach
depending on machine / workload may lead to severe starvation. One can
argue that controlled trade off between fairness and bandwidth is useful
for some use cases. However, even if that is the case, I don't think
trylock is the way to get there. If we think that low overhead high
fan-out shared queue is desirable, it'd be better to introduce dedicated
data structure which can do so in a controlled manner.

Thakns.

--
tejun