Re: [RFC][PATCH] sched/ext: Split curr|donor references properly

From: Andrea Righi

Date: Sun Dec 07 2025 - 04:12:46 EST


On Sun, Dec 07, 2025 at 09:54:32AM +0100, Andrea Righi wrote:
> On Fri, Dec 05, 2025 at 09:47:24PM -0500, Joel Fernandes wrote:
> > On Sat, Dec 06, 2025 at 12:14:45AM +0000, John Stultz wrote:
> > > With proxy-exec, we want to do the accounting against the donor
> > > most of the time. Without proxy-exec, there should be no
> > > difference as the rq->donor and rq->curr are the same.
> > >
> > > So rework the logic to reference the rq->donor where appropriate.
> > >
> > > Also add donor info to scx_dump_state()
> > >
> > > Since CONFIG_SCHED_PROXY_EXEC currently depends on
> > > !CONFIG_SCHED_CLASS_EXT, this should have no effect
> > > (other then the extra donor output in scx_dump_state),
> > > but this is one step needed to eventually remove that
> > > constraint for proxy-exec.
> > >
> > > Just wanted to send this out for early review prior to LPC.
> > >
> > > Feedback or thoughts would be greatly appreciated!
> >
> > Hi John,
> >
> > I'm wondering if this will work well for BPF tasks because my understanding
> > is that some scheduler BPF programs also monitor runtime statistics. If they are unaware of proxy execution, how will it work?
>
> Right, some schedulers are relying on p->scx.slice to evaluate task
> runtime. It'd be nice for the BPF schedulers to be aware of the donor.
>
> >
> > I don't see any code in the patch that passes the donor information to the
> > BPF ops, for instance. I would really like the SCX folks to chime in before
> > we can move this patch forward. Thanks for marking it as an RFC.
> >
> > We need to get a handle on how a scheduler BPF program will pass information
> > about the donor to the currently executing task. If we can make this happen
> > transparently, that's ideal. Otherwise, we may have to pass both the donor
> > task and the currently executing task to the BPF ops.
>
> That's what I was thinking, callbacks like ops.running(), ops.tick() and
> ops.stopping() should probably have a struct task_struct *donor argument in
> addition to struct task_struct *p. Then the BPF scheduler can decide how to
> use the donor information (this would address also the runtime evaluation).

Or, better, have a kfunc like the following (I'm just sketching it, this is
likely broken):

__bpf_kfunc struct task_struct *scx_bpf_task_donor(const struct task_struct *p)
{
struct task_struct *curr, *donor;
struct rq *rq;

#ifndef CONFIG_SCHED_PROXY_EXEC
return (struct task_struct *)p;
#endif

rq = task_rq(p);
curr = READ_ONCE(rq->curr);
donor = READ_ONCE(rq->donor);

/*
* If @p is currently executing, return the donor.
*
* The donor can be:
* - same as curr (no proxy execution active)
* - different from curr (proxy execution: curr is running with
* donor's context)
*/
if (curr == p)
return donor;

/*
* If @p is not currently executing (queued, sleeping, etc.),
* the concept of donor doesn't apply, return @p itself.
*/
return (struct task_struct *)p;
}

And then let the BPF scheduler decide how to use this information (while
still updating time slice and check sched_class accordingly, as John is
proposing).

-Andrea