Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution
From: Peter Zijlstra
Date: Fri Apr 03 2026 - 10:41:16 EST
On Fri, Apr 03, 2026 at 07:13:29PM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> On 4/3/2026 4:58 PM, Peter Zijlstra wrote:
> > On Fri, Apr 03, 2026 at 03:55:22PM +0530, K Prateek Nayak wrote:
> >>>> @@ -4256,6 +4277,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >>>> */
> >>>> smp_cond_load_acquire(&p->on_cpu, !VAL);
> >>>>
> >>>> + /*
> >>>> + * We never clear the blocked_on relation on proxy_deactivate.
> >>>> + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
> >>>> + * when waking up. Since this is a fully blocked, off CPU task
> >>>> + * waking up, it should be safe to clear the blocked_on relation.
> >>>> + */
> >>>> + if (task_is_blocked(p))
> >>>> + clear_task_blocked_on(p, NULL);
> >>>> +
> >>>
> >>> Aah, yes! This is when find_proxy_task() hits deactivate() for us and we
> >>> skip ttwu_runnable(). We still need to clear ->blocked_on.
> >
> > I wonder, should we have proxy_deactivate() do this instead?
>
> That is one way to tackle that, yes!
OK, lets put it there. At that site we already know task_is_blocked()
and we get less noise in the wakeup path.
Or should we perhaps put it in block_task() itself? The moment you're
off the runqueue, ->blocked_on becomes meaningless.
> >>>
> >>>> cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
> >>>> if (task_cpu(p) != cpu) {
> >>>> if (p->in_iowait) {
> >>>> @@ -6977,6 +7007,10 @@ static void __sched notrace __schedule(int sched_mode)
> >>>> switch_count = &prev->nvcsw;
> >>>> }
> >>>>
> >>>> + /* See: https://github.com/kudureranganath/linux/commit/0d6a01bb19db39f045d6f0f5fb4d196500091637 */
> >>>> + if (!prev_state && task_is_blocked(prev))
> >>>> + clear_task_blocked_on(prev, NULL);
> >>>> +
> >>>
> >>> This one confuses me, ttwu() should never results in ->blocked_on being
> >>> set.
> >>
> >> This is from the signal_pending_state() in try_to_block_task() putting
> >> prev to TASK_RUNNING while still having p->blocked_on.
> >
> > Ooh, I misread that. I saw that set_task_blocked_on_waking(, NULL) in
> > there and though it would clear. Damn, sometimes reading is so very
> > hard...
> >
> > Anyhow, with my changes on, try_to_block_task() is only ever called from
> > __schedule() on current. This means this could in fact be
> > clear_task_blocked_on(), right?
>
> Yes, that should work too but there is also the case you mentioned on
> the parallel thread - if ttwu() doesn't see p->blocked_on, it won't
> clear it and if ttwu_runnable() wins we can simply go into
> __schedule() with TASK_RUNNING + p->blocked_on with no guarantee that
> same task will be picked again. Then the task gets preempted and stays
> in an illegal state.
>
> Clearing of ->blocked_on in schedule also safeguards against that race:
>
> o Scenario 1: ttwu_runnable() wins
>
> CPU0 CPU1
>
> LOCK ACQUIRE
> [W] ->blocked_on = lock [R] ->__state
> [W] ->__state = state; RMB
> [R] ->blocked_on
> [W] ->__state = RUNNING
>
> /* Synchronized by rq_lock */
>
> MB
> [R] if (!->__state &&
> [R] ->blocked_on)
> [W] ->blocked_on = NULL; /* Safeguard */
>
>
> o Scenario 2: __schedule() wins
>
> CPU0 CPU1
>
> LOCK ACQUIRE
> [W] ->blocked_on = lock [R] ->__state
> [W] ->__state = state;
>
> /* Synchronized by rq_lock */
> MB
> [W] __block_task()
>
> MB
> /* Full wakeup. */
> [R] if (->blocked_on)
> [W] ->blocked_on = NULL
>
Oh Boohoo :-( Yes, you're quite right.
Is find_proxy_task() that is affected, so this fixup should
perhaps live in the existing sched_proxy_exec() branch?
Something like so?
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2222,6 +2222,13 @@ static bool block_task(struct rq *rq, st
{
int flags = DEQUEUE_NOCLOCK;
+ /*
+ * We're being taken off the runqueue, cannot still be blocked_on
+ * anything. This also means that delay_dequeue can not have
+ * blocked_on.
+ */
+ clear_task_blocked_on(p, NULL);
+
p->sched_contributes_to_load =
(task_state & TASK_UNINTERRUPTIBLE) &&
!(task_state & TASK_NOLOAD) &&
@@ -6614,7 +6621,7 @@ static bool try_to_block_task(struct rq
if (signal_pending_state(task_state, p)) {
WRITE_ONCE(p->__state, TASK_RUNNING);
*task_state_p = TASK_RUNNING;
- set_task_blocked_on_waking(p, NULL);
+ clear_task_blocked_on(p, NULL);
return false;
}
@@ -7043,6 +7050,14 @@ static void __sched notrace __schedule(i
if (sched_proxy_exec()) {
struct task_struct *prev_donor = rq->donor;
+ /*
+ * There is a race between ttwu() and __mutex_lock_common()
+ * where it is possible for the mutex code to call into
+ * schedule() with ->blocked_on still set.
+ */
+ if (!prev_state && prev->blocked_on)
+ clear_task_blocked_on(prev, NULL);
+
rq_set_donor(rq, next);
if (unlikely(next->blocked_on)) {
next = find_proxy_task(rq, next, &rf);