Re: [RFC PATCH v15 7/7] sched: Start blocked_on chain processing in find_proxy_task()

From: Peter Zijlstra
Date: Mon Mar 17 2025 - 12:56:10 EST


On Wed, Mar 12, 2025 at 03:11:37PM -0700, John Stultz wrote:

> @@ -2950,8 +2951,15 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
> struct set_affinity_pending my_pending = { }, *pending = NULL;
> bool stop_pending, complete = false;
>
> - /* Can the task run on the task's current CPU? If so, we're done */
> - if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
> + /*
> + * Can the task run on the task's current CPU? If so, we're done
> + *
> + * We are also done if the task is the current donor, boosting a lock-
> + * holding proxy, (and potentially has been migrated outside its
> + * current or previous affinity mask)
> + */
> + if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) ||
> + (task_current_donor(rq, p) && !task_current(rq, p))) {
> struct task_struct *push_task = NULL;
>
> if ((flags & SCA_MIGRATE_ENABLE) &&

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f8ad3a44b3771..091f1a01b3327 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9385,6 +9385,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> * 3) cannot be migrated to this CPU due to cpus_ptr, or
> * 4) running (obviously), or
> * 5) are cache-hot on their current CPU.
> + * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled)
> */
> if ((p->se.sched_delayed) && (env->migration_type != migrate_load))
> return 0;
> @@ -9406,6 +9407,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> if (kthread_is_per_cpu(p))
> return 0;
>
> + if (task_is_blocked(p))
> + return 0;
> +
> if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
> int cpu;
>
> @@ -9442,7 +9446,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> /* Record that we found at least one task that could run on dst_cpu */
> env->flags &= ~LBF_ALL_PINNED;
>
> - if (task_on_cpu(env->src_rq, p)) {
> + if (task_on_cpu(env->src_rq, p) ||
> + task_current_donor(env->src_rq, p)) {
> schedstat_inc(p->stats.nr_failed_migrations_running);
> return 0;
> }


Somehow this and the previous patches that touched upon this made me
think that perhaps we can share with migrate_disable(). Specifically, we
seem to be adding those donor checks and hooks to exactly those
locations.

I've not actually tried though.