Re: [RFC][PATCH 2/2] sched: proxy-exec: Add allow/prevent_migration hooks in the sched classes for proxy_tag_curr
From: Peter Zijlstra
Date: Thu Mar 05 2026 - 09:54:41 EST
On Thu, Mar 05, 2026 at 09:32:05AM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> On 3/4/2026 6:48 PM, Peter Zijlstra wrote:
> > +static inline void set_proxy_task(struct task_struct *p)
> > {
> > - if (!sched_proxy_exec())
> > - return;
> > - /*
> > - * pick_next_task() calls set_next_task() on the chosen task
> > - * at some point, which ensures it is not push/pullable.
> > - * However, the chosen/donor task *and* the mutex owner form an
> > - * atomic pair wrt push/pull.
> > - *
> > - * Make sure owner we run is not pushable. Unfortunately we can
> > - * only deal with that by means of a dequeue/enqueue cycle. :-/
> > - */
> > - dequeue_task(rq, owner, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
> > - enqueue_task(rq, owner, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
> > + WARN_ON_ONCE(p->migration_flags & MDF_PROXY);
> > + p->migration_flags |= MDF_PROXY;
> > + p->migration_disabled++;
> > +}
> > +
> > +static inline void put_proxy_task(struct task_struct *p)
> > +{
> > + WARN_ON_ONCE(!(p->migration_flags & MDF_PROXY));
> > + p->migration_flags &= ~MDF_PROXY;
> > + p->migration_disabled--;
>
> Note: I'm not too familiar with the set_affinity bits so my
> understanding might be all wrong but ...
>
> Doesn't the set_affinity bits have a completion based wait for tasks
> that are migration disabled? If we have a case like:
>
> P0 P1
> == ==
>
> migrate_disable()
> p->migration_disabled++; /* 1 */
> ...
> mutex_lock()
> ...
>
> /* preempted */
> /* proxied */
> set_proxy_task(P0)
> p->migration_disabled++; /* 2 */
> ...
> /* Continues running */
> set_cpus_allowed_ptr(P0)
> /* Task CPU not in the new mask. */
> affine_move_task()
> /*
> * blocks as per the comment
> * above affine_move_task().
> */
> migrate_enable()
> if (p->migration_disabled > 1)
> p->migration_disabled--; /* 1 */
> return;
> ...
> mutex_unlock();
> /* Goes into schedule. */
> put_proxy_task(P0)
> p->migration_disabled--; /* 0 */
>
> /* !!! Who does the migration + wakeup? !!! */
>
>
> Isn't it up to the last migrate_enable() (or in this case,
> put_proxy_task()) to schedule in the stopper and push the prev to
> another CPU? Or is it handled in some other way?
Indeed; so I think we can fix that by doing something like the below.
Have actual migrate_{dis,en}able {inc,dec}rement by 2 and have
this proxy thing {inc,dec} by 1.
Then have migrate_enable() ignore the low bit, such that 3->1 does the
slow-path and issues the completion.
So we only have the low bit set for the current proxy task; IOW any
non-current task will have it clear.
Then there are 3 sites that do the completion:
- migrate_cpu_stop()
- affine_move_task(); (1) when the mask fits the current location
- affine_move_task(); (2) when the mask doesn't fit and the task is not
running
migrate_cpu_stop() is safe, because the stopper task can neither block
nor be the owner of a lock and must this exist outside of PE,
furthermore if it is runnung, no other task is running and thus the
target task cannot have the low bit set.
affine_move_task() case (1) is obviously fine.
affine_move_task() case (2) is also fine, because similar to
migrate_cpu_stop() the target task is found to not be running, and
therefore it cannot have the low bit set.
*lightbulb*... but, doesn't that mean that we don't need any of this at
all, and could simply make sure RT/DL refuse to migrate task_on_cpu(p) ?
---
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2412,8 +2412,8 @@ static inline void __migrate_enable(void
return;
#endif
- if (p->migration_disabled > 1) {
- p->migration_disabled--;
+ if (p->migration_disabled > 3) {
+ p->migration_disabled -= 2;
return;
}
@@ -2430,7 +2430,7 @@ static inline void __migrate_enable(void
* select_fallback_rq) get confused.
*/
barrier();
- p->migration_disabled = 0;
+ p->migration_disabled -= 2;
this_rq_pinned()--;
}
@@ -2445,13 +2445,13 @@ static inline void __migrate_disable(voi
*/
WARN_ON_ONCE((s16)p->migration_disabled < 0);
#endif
- p->migration_disabled++;
+ p->migration_disabled += 2;
return;
}
guard(preempt)();
this_rq_pinned()++;
- p->migration_disabled = 1;
+ p->migration_disabled += 2;
}
#else /* !COMPILE_OFFSETS */
static inline void __migrate_disable(void) { }
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2410,11 +2410,7 @@ static void migrate_disable_switch(struc
.flags = SCA_MIGRATE_DISABLE,
};
- if (likely(!p->migration_disabled))
- return;
-
- if ((p->migration_flags & MDF_PROXY) &&
- p->migration_disabled == 1)
+ if (likely(!(p->migration_disabled & ~1)))
return;
if (p->cpus_ptr != &p->cpus_mask)
@@ -6758,15 +6754,13 @@ find_proxy_task(struct rq *rq, struct ta
static inline void set_proxy_task(struct task_struct *p)
{
- WARN_ON_ONCE(p->migration_flags & MDF_PROXY);
- p->migration_flags |= MDF_PROXY;
+ WARN_ON_ONCE(p->migration_disabled & 1);
p->migration_disabled++;
}
static inline void put_proxy_task(struct task_struct *p)
{
- WARN_ON_ONCE(!(p->migration_flags & MDF_PROXY));
- p->migration_flags &= ~MDF_PROXY;
+ WARN_ON_ONCE(!(p->migration_disabled & 1));
p->migration_disabled--;
}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1368,7 +1368,6 @@ static inline int cpu_of(struct rq *rq)
}
#define MDF_PUSH 0x01
-#define MDF_PROXY 0x02
static inline bool is_migration_disabled(struct task_struct *p)
{