Re: [PATCH 1/1] sched/core: Don't steal a proxy-exec donor
From: Vasily Gorbik
Date: Tue May 05 2026 - 06:03:44 EST
On Mon, May 04, 2026 at 06:49:05PM +0530, K Prateek Nayak wrote:
> On 5/4/2026 6:01 PM, Vasily Gorbik wrote:
...
> > Commit 7de9d4f94638 ("sched: Start blocked_on chain processing in
> > find_proxy_task()") tweaked the fair class logic so that the donor task
> > isn't migrated away while running the proxy. Do it similarly for
> > try_steal_cookie() and skip src->donor as well.
> >
> > Fixes: 7de9d4f94638 ("sched: Start blocked_on chain processing in find_proxy_task()")
> > Signed-off-by: Vasily Gorbik <gor@xxxxxxxxxxxxx>
> > ---
> > kernel/sched/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index b8871449d3c6..3cf5fb70814c 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6358,7 +6358,7 @@ static bool try_steal_cookie(int this, int that)
> > return false;
> >
> > do {
> > - if (p == src->core_pick || p == src->curr)
> > + if (p == src->core_pick || p == src->curr || p == src->donor)
>
> Although this solves the issue of stealing the donor, I'm a bit
> skeptical if proxy exec even works with core scheduling at all since
> __schedule() can override the decision of core_pick and the CPU
> may end up running a task with different core-cookie if it found
> the core_pick to be blocked on a mutex :-(
I think this patch is still valid on its own.
The cookie problem probably needs to be handled separately.
Do you mean this path?
next = pick_next_task(...);
rq_set_donor(rq, next);
next = find_proxy_task(...); /* may replace next with mutex owner */
I'm trying a check in find_proxy_task(), before returning the final owner:
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3cf5fb70814c..46d21ac83e72 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6952,6 +6952,12 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
*/
}
WARN_ON_ONCE(owner && !owner->on_rq);
+
+ if (owner && !sched_cpu_cookie_match(rq, owner)) {
+ if (curr_in_chain)
+ return proxy_resched_idle(rq);
+ goto deactivate;
+ }
return owner;
deactivate:
--
But I'm not sure this is the right/acceptable/sufficient fix though.
With that check and with temporary debugfs counters I added, on the same
LPAR as in my initial report:
cd strace/tests && make -j$(nproc) check
gives:
attempt_total 157
attempt_cookie 106
attempt_cookie_mismatch 105
exec_total 52
exec_cookie 1
exec_cookie_mismatch 0
So strace tests do exercise mismatched proxy attempts. I'm not sure if
there is a better specific proxy-exec test to run, please let me know.