Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks

From: Andrea Righi
Date: Fri Feb 07 2025 - 17:36:25 EST


Hi Tejun,

On Fri, Feb 07, 2025 at 10:58:23AM -1000, Tejun Heo wrote:
> Migration disabled tasks are special and pinned to their previous CPUs. They
> tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
> not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
> by automatically dispatching them to the pinned local DSQs by default. If a
> BPF scheduler wants to handle migration disabled tasks explicitly, it can
> set SCX_OPS_ENQ_MIGRATION_DISABLED.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> kernel/sched/ext.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -123,6 +123,19 @@ enum scx_ops_flags {
> SCX_OPS_SWITCH_PARTIAL = 1LLU << 3,
>
> /*
> + * A migration disabled task can only execute on its current CPU. By
> + * default, such tasks are automatically put on the CPU's local DSQ with
> + * the default slice on enqueue. If this ops flag is set, they also go
> + * through ops.enqueue().
> + *
> + * A migration disabled task never invokes ops.select_cpu() as it can
> + * only select the current CPU. Also, p->cpus_ptr will only contain its
> + * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
> + * and thus may disagree with cpumask_weight(p->cpus_ptr).
> + */
> + SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
> +
> + /*
> * CPU cgroup support flags
> */
> SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16, /* cpu.weight */
> @@ -130,6 +143,7 @@ enum scx_ops_flags {
> SCX_OPS_ALL_FLAGS = SCX_OPS_KEEP_BUILTIN_IDLE |
> SCX_OPS_ENQ_LAST |
> SCX_OPS_ENQ_EXITING |
> + SCX_OPS_ENQ_MIGRATION_DISABLED |
> SCX_OPS_SWITCH_PARTIAL |
> SCX_OPS_HAS_CGROUP_WEIGHT,
> };
> @@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
>
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
> +static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
> static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
>
> @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
> unlikely(p->flags & PF_EXITING))
> goto local;
>
> + /* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> + if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> + is_migration_disabled(p))
> + goto local;

Maybe not in this patch set, but it'd be nice to have an event counter for
this, as skipping ops.enqueue() might introduce latency issues. Having a
feedback could help to determine if we need to enable
SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.

I'm also a bit conflicted if the default should be on or off, we're
changing the previous behavior, but OTOH this is going to prevent some
potential breakage (due to the nr_cpus_allowed mismatch) and server
workload is going to benefit from this, so it seems that there are more
pros than cons at dispatching migration_disabled tasks directly by default.

And I also did a quick test with this and it seems good, so:

Acked-by: Andrea Righi <arighi@xxxxxxxxxx>

-Andrea