Re: [PATCH v2 0/2] sched/core: Fix proxy-exec/core-sched interactions

From: John Stultz

Date: Tue May 12 2026 - 20:48:43 EST


On Tue, May 12, 2026 at 2:17 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> On Thu, May 7, 2026 at 3:42 AM Vasily Gorbik <gor@xxxxxxxxxxxxx> wrote:
> >
> > v1 [1] consisted of a fix for a scheduler corruption where
> > try_steal_cookie() could migrate a proxy-exec donor away from the source
> > rq while that rq still used it as the active scheduling context.
> >
> > Prateek pointed out [2] a separate proxy-exec/core-sched issue: after
> > pick_next_task() selects a core cookie compatible donor, find_proxy_task()
> > can replace the execution context with a mutex owner with a different
> > cookie.
> >
> > This v2 keeps the donor steal fix as patch 1 and adds patch 2 to reject
> > mismatched final proxy owners.
> >
> > The v1 reported the issue reproduced on s390 LPAR, but it seems to be
> > easily reproducible with strace test suite "make -j$(nproc) check" on
> > any system with SMT, CONFIG_SCHED_CORE=y and CONFIG_SCHED_PROXY_EXEC=y
> > enabled, e.g. on x86 KVM with -smp cpus=16,sockets=1,cores=8,threads=2:
> >
>
> Vasily! Thank you so much for reporting this and working out fixes
> (along with K Prateek!)
>
> Apologies for being slow to reply, I've been under the weather.
>
> I really appreciate this reproducer detail, but I've so far not been
> able to trip this issue up (SCHED_CORE=y, SCHED_PROXY_EXEC=y and using
> the qemu arguments you included above). Could you mail me your .config
> in case something else is needed?

Ok, I think I was able to force it using my priority-inversion-demo by
taking the spots in the run.sh script where we kick off the
rename-test and prefixing it with `coresched new -t pid --`
https://github.com/johnstultz-work/priority-inversion-demo/blob/main/run.sh#L89

That way the foreground/background tasks run with separate cookies and
that forces proxying across cookies, and with that I've tripped over
the issues you highlight.

That said, I'm still curious to learn more about your x86 environment
and why it tripped so much more easily there, so let me know.

With your patches it does seem to resolve things, but I'm also hoping
to find some better ways to more thoroughly stress the
proxy+core-sched logic.

thanks again!
-john