Re: [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime
From: Tejun Heo
Date: Fri Feb 27 2026 - 17:25:57 EST
Hello, Andrea.
On Thu, Feb 26, 2026 at 04:13:33PM +0100, Andrea Righi wrote:
> My concern with this is that we may introduce some overhead for those
> schedulers that require frequent adjustment of slice / dsq_vtime directly.
I'm a bit skeptical about the premise. Unless p->scx.vtime/slice are used
for BPF side book-keeping, the only times they need to be modified are:
- When inserting into a vtime DSQ, vtime needs to be set. However, the
interface functions already have provisions for setting vtime, so direct
manipulation isn't necessary.
- slice can be simliar but can also be a bit more complicated. As slice only
affects when the task actually gets on the CPU and a task may not have its
eventual slice known at the time of its insertion into a user DSQ. In such
cases, it may be necessary to set the slice as the task starts execution
from e.g. ops.running().
- While a task is running, slice modification can be used to give the task
more or less CPU time. Most commonly, these would be either extending
slice to keep running the current task or preemting the task by setting
the slice to zero and triggering a scheduling event.
So, as long as p->scx.vtime/slice are used to instruct the kernel what to
do, as opposed to being used for BPF side book-keeping, vtime doesn't need
to be directly modified at all and while slice may need to be modified,
those are mostly directly tied to actual scheduling operations and context
switches. I'd be surprised if the kfunc overhead is noticeable at all. kfunc
calls aren't expensive unless you're banging on it in a tight loop. Also,
note that in the lowest overhead scheduling scenario - direct dispatch to a
local DSQ from select_cpu()/enqueue() - neither is needed. It'd just be a
single scx_bpf_dsq_insert() call.
> While the scx_task_on_sched() check itself has likely zero impact, the
> kfunc invocations can potentially introduce measurable overhead.
>
> I'm wondering if we could instead delegate the authority check at
> verification time, introducing something similar to PTR_TRUSTED
> (PTR_SCX_AUTH?) to struct task_struct * to represent that the scheduler has
> authority to access the task and allow direct writes to p->scx.slice /
> p->scx.dsq_vtime only when the register has that flag.
>
> Then:
> - for tasks passed from the core opts (enqueue, dispatch, etc.) we
> automatically tag them with PTR_SCX_AUTH,
> - tasks obtained externally (e.g., via bpf_task_from_pid()): they don't
> have the flag (so no modification allowed) and in this case maybe we
> provide a scx_bpf_auth_task() kfunc to perform the scx_task_on_sched()
> check that returns p (or NULL) setting the auth flag if the scheduler
> has full access to the task.
So, I'm not sure this is something we need to invest complexity into. The
only cases I can think of where the overhead might become visible is if the
BPF sched uses these fields for internal bookkeeping and keeps updating a
lot more times than there are actual scheduling events. However, I don't
think that's a usage model that we want to encourage.
Thanks.
--
tejun