Re: [PATCH] sched/ext: Add cpumask to skip unsuitable dispatch queues
From: Qiliang Yuan
Date: Wed Feb 04 2026 - 06:37:25 EST
Hi Andrea, I have fixed those issues in v2:
https://lore.kernel.org/all/20260204093435.3915393-1-realwujing@xxxxxxxxx/
On Tue, Feb 03, 2026 at 09:37:14AM +0100, Andrea Righi wrote:
> Did you run some benchmarks / have some numbers?
I'm working on collecting more detailed benchmark numbers. However, theoretically,
the bitwise cpumask_or() should be much cheaper than a DSQ scan that results in
multiple cache misses during task structure dereferencing, even for small queues.
> It's true that we save the O(N) scan when the DSQ has no eligible tasks, but we're
> adding cost on every enqueue: cpumask_or() on potentially large cpumasks can be
> expensive.
> ... for small queues or mixed workloads, the cpumask overhead probably exceeds
> the savings...
To minimize the enqueue overhead, I've optimized the dispatch_enqueue() path in v2:
- Use cpumask_copy() instead of cpumask_or() when the task is the first one in the DSQ.
- Skip the cpumask_or() update if the DSQ's cpus_allowed mask is already full.
> The cpumask is only updated during enqueue and cleared when the queue empties. If a
> task's affinity changes while it's already in the queue (i.e., sched_setaffinity()),
> the cpus_allowed mask becomes stale.
Fixed in v2. I've added a hook in set_cpus_allowed_scx() to update the DSQ's
cpus_allowed mask whenever a task's affinity changes while it is enqueued in a DSQ.
> I don't see the corresponding kfree() in the cleanup path.
Fixed in v2. I've added an RCU callback (free_dsq_rcu_callback) to explicitly free
dsq->cpus_allowed before freeing the DSQ structure itself.
Also, I've restricted the cpumask allocation to user-defined DSQs only, as built-in
DSQs (local, global, bypass) don't need this optimization.
Thanks,
Qiliang