Re: [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime
From: Andrea Righi
Date: Fri Feb 27 2026 - 18:57:22 EST
On Fri, Feb 27, 2026 at 12:25:45PM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Thu, Feb 26, 2026 at 04:13:33PM +0100, Andrea Righi wrote:
> > My concern with this is that we may introduce some overhead for those
> > schedulers that require frequent adjustment of slice / dsq_vtime directly.
>
> I'm a bit skeptical about the premise. Unless p->scx.vtime/slice are used
> for BPF side book-keeping, the only times they need to be modified are:
>
> - When inserting into a vtime DSQ, vtime needs to be set. However, the
> interface functions already have provisions for setting vtime, so direct
> manipulation isn't necessary.
>
> - slice can be simliar but can also be a bit more complicated. As slice only
> affects when the task actually gets on the CPU and a task may not have its
> eventual slice known at the time of its insertion into a user DSQ. In such
> cases, it may be necessary to set the slice as the task starts execution
> from e.g. ops.running().
>
> - While a task is running, slice modification can be used to give the task
> more or less CPU time. Most commonly, these would be either extending
> slice to keep running the current task or preemting the task by setting
> the slice to zero and triggering a scheduling event.
>
> So, as long as p->scx.vtime/slice are used to instruct the kernel what to
> do, as opposed to being used for BPF side book-keeping, vtime doesn't need
> to be directly modified at all and while slice may need to be modified,
> those are mostly directly tied to actual scheduling operations and context
> switches. I'd be surprised if the kfunc overhead is noticeable at all. kfunc
> calls aren't expensive unless you're banging on it in a tight loop. Also,
> note that in the lowest overhead scheduling scenario - direct dispatch to a
> local DSQ from select_cpu()/enqueue() - neither is needed. It'd just be a
> single scx_bpf_dsq_insert() call.
>
> > While the scx_task_on_sched() check itself has likely zero impact, the
> > kfunc invocations can potentially introduce measurable overhead.
> >
> > I'm wondering if we could instead delegate the authority check at
> > verification time, introducing something similar to PTR_TRUSTED
> > (PTR_SCX_AUTH?) to struct task_struct * to represent that the scheduler has
> > authority to access the task and allow direct writes to p->scx.slice /
> > p->scx.dsq_vtime only when the register has that flag.
> >
> > Then:
> > - for tasks passed from the core opts (enqueue, dispatch, etc.) we
> > automatically tag them with PTR_SCX_AUTH,
> > - tasks obtained externally (e.g., via bpf_task_from_pid()): they don't
> > have the flag (so no modification allowed) and in this case maybe we
> > provide a scx_bpf_auth_task() kfunc to perform the scx_task_on_sched()
> > check that returns p (or NULL) setting the auth flag if the scheduler
> > has full access to the task.
>
> So, I'm not sure this is something we need to invest complexity into. The
> only cases I can think of where the overhead might become visible is if the
> BPF sched uses these fields for internal bookkeeping and keeps updating a
> lot more times than there are actual scheduling events. However, I don't
> think that's a usage model that we want to encourage.
Ack, also we don't necessarily need to make it perfect right now, we can
begin with the set_slice/set_dsq_vtime kfuncs and refine the appraoch later
if we find performance regressions.
Thanks,
-Andrea