Re: [PATCH v24 06/11] sched: Handle blocked-waiter migration (and return migration)

From: John Stultz

Date: Sat Mar 07 2026 - 01:48:51 EST

On Mon, Dec 29, 2025 at 9:34 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
> On 11/25/2025 4:00 AM, John Stultz wrote:
> > +
> > + raw_spin_rq_lock(rq);
> > + rq_repin_lock(rq, rf);
> > + update_rq_clock(rq);
>
> I'm tempted to suggest extracting this pattern into a guard. We seem
> to always do the zap + unpin + unlock somewhere in the middle and
> lock + repin + update clock at the end of a function so how about:
>
[snipped guard logic]
>
> If the guard obfuscates the flow too much, I'm happy with the current
> approach as well but having the guard + a comment on top of its usage is
> lot more soothing on the eye IMHO :-)
>
> It also simplifies proxy_force_return() if my next set of comments aren't
> too horrible. I'll leave it to you and Peter to decide which is best.

So first of all, apologies for being so slow to get to this! I very
much apprecaite your feedback here, but the first two months of this
year have been busy and I've not had the mental space to get my head
back into the proxy-exec details. I definitely did not intend to
neglect this for so long.

So on your suggestion, I'm torn. It does make the code simpler, but
I've found some guard() usage to be less readable for me, as it can
hide some subtleties.
But I also felt this way when I first ran into scoped locking and I've
started to warm to that.
So maybe if it's ok, I'll leave this for now, but let's consider it
for a future refactor?

> > +static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> > + struct task_struct *p)
> > +{
> > + struct rq *this_rq, *target_rq;
> > + struct rq_flags this_rf;
> > + int cpu, wake_flag = 0;
>
> Should we pretend this is a wakeup with WF_TTWU? There are some
> affine wakeup considerations that select_task_rq_fair() adds with
> WF_TTWU. Without it, we'll end up selecting the wake_cpu's LLC to
> search for the wakeup target.

Ack

> > +
> > + lockdep_assert_rq_held(rq);
> > + WARN_ON(p == rq->curr);
> > +
> > + get_task_struct(p);
>
> Since we are IRQs disabled (equivalent of RCU read-side), I don't think
> we need to grab an extra reference here since the task_struct cannot
> disappear from under us and we take the necessary locks to confirm the
> task's rq associations so we shouldn't race with anything either.

Ack.

> > +
> > + /*
> > + * We have to zap callbacks before unlocking the rq
> > + * as another CPU may jump in and call sched_balance_rq
> > + * which can trip the warning in rq_pin_lock() if we
> > + * leave callbacks set.
> > + */
> > + zap_balance_callbacks(rq);
> > + rq_unpin_lock(rq, rf);
> > + raw_spin_rq_unlock(rq);
> > +
> > + /*
> > + * We drop the rq lock, and re-grab task_rq_lock to get
> > + * the pi_lock (needed for select_task_rq) as well.
> > + */
> > + this_rq = task_rq_lock(p, &this_rf);
>
> So I'm failing to see why we need to drop the rq_lock, re-grab it via
> task_rq_lock() when we are eventually going to do a deactivate_task().
>
> Once we deactivate, wakeup path will stall at ttwu_runnable() since it
> cannot grab the __task_rq_lock() with task_on_rq_migrating() so we
> should be able to safely move the task around.
>
> Maybe my naive eyes are missing something (and perhaps there are some
> details that arrive with sleeping owner stuff) but the following
> boots fine and survives a sched-messaging and test-ww_mutex run
> without any splat on my machine at this point:

Hum. The reason we need the pi_lock is for select_task_rq(), and since
we hold the rq lock (which is under the pi_lock in the locking order),
we need to drop it to take the pi_lock and rq_lock via task_rq_lock().
I know it's not pretty, but I'm not sure how your testing wouldn't
have blown up on the lockdep_assert_held() in select_task_rq().

Trying a simplified version (without the guard) locally leaves my
console filled with splats:
WARNING: CPU: 20 PID: 0 at kernel/sched/core.c:3746 select_task_rq+0xdc/0x120

But let me know if there's something I'm being daft about.

>
> Also, we may want to just move the attach_one_task() helper to sched.h
> and use it in both proxy_migrate_task() and proxy_force_return().
>

Ah, that looks nice!

Again, thank you so much for the review feedback! I *really*
appreciate the suggestions and am eager to hear more of your thoughts!

thanks
-john