Re: [PATCH] sched/ext: Add cpumask to skip unsuitable dispatch queues

Next message: Mario Lohajner: "Re: [PATCH] ext4: add optional rotating block allocation policy"
Previous message: Laurent Pinchart: "Re: [PATCH v1 3/6] media: rkisp1-isp: Add target_format"
In reply to: Andrea Righi: "Re: [PATCH] sched/ext: Add cpumask to skip unsuitable dispatch queues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Qiliang Yuan

Date: Wed Feb 04 2026 - 06:37:25 EST

Hi Andrea, I have fixed those issues in v2:

https://lore.kernel.org/all/20260204093435.3915393-1-realwujing@xxxxxxxxx/

On Tue, Feb 03, 2026 at 09:37:14AM +0100, Andrea Righi wrote:
> Did you run some benchmarks / have some numbers?

I'm working on collecting more detailed benchmark numbers. However, theoretically,
the bitwise cpumask_or() should be much cheaper than a DSQ scan that results in
multiple cache misses during task structure dereferencing, even for small queues.

> It's true that we save the O(N) scan when the DSQ has no eligible tasks, but we're
> adding cost on every enqueue: cpumask_or() on potentially large cpumasks can be
> expensive.
> ... for small queues or mixed workloads, the cpumask overhead probably exceeds
> the savings...

To minimize the enqueue overhead, I've optimized the dispatch_enqueue() path in v2:
- Use cpumask_copy() instead of cpumask_or() when the task is the first one in the DSQ.
- Skip the cpumask_or() update if the DSQ's cpus_allowed mask is already full.

> The cpumask is only updated during enqueue and cleared when the queue empties. If a
> task's affinity changes while it's already in the queue (i.e., sched_setaffinity()),
> the cpus_allowed mask becomes stale.

Fixed in v2. I've added a hook in set_cpus_allowed_scx() to update the DSQ's
cpus_allowed mask whenever a task's affinity changes while it is enqueued in a DSQ.

> I don't see the corresponding kfree() in the cleanup path.

Fixed in v2. I've added an RCU callback (free_dsq_rcu_callback) to explicitly free
dsq->cpus_allowed before freeing the DSQ structure itself.

Also, I've restricted the cpumask allocation to user-defined DSQs only, as built-in
DSQs (local, global, bypass) don't need this optimization.

Thanks,
Qiliang