Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
From: Tejun Heo
Date: Sun Dec 28 2025 - 12:19:52 EST
Hello, Andrea.
On Fri, Dec 19, 2025 at 11:43:14PM +0100, Andrea Righi wrote:
...
> + Once ``ops.enqueue()`` is called, the task is considered "enqueued" and
> + is owned by the BPF scheduler. Ownership is retained until the task is
> + either dispatched (moved to a local DSQ for execution) or dequeued
> + (removed from the scheduler due to a blocking event, or to modify a
> + property, like CPU affinity, priority, etc.). When the task leaves the
> + BPF scheduler ``ops.dequeue()`` is invoked.
> +
> + **Important**: ``ops.dequeue()`` is called for *any* enqueued task,
> + regardless of whether the task is still on a BPF data structure, or it
> + is already dispatched to a DSQ (global, local, or user DSQ)
> +
> + This guarantees that every ``ops.enqueue()`` will eventually be followed
> + by a ``ops.dequeue()``. This makes it reliable for BPF schedulers to
> + track task ownership and maintain accurate accounting, such as per-DSQ
> + queued runtime sums.
While this works, from the BPF sched's POV, there's no way to tell whether
an ops.dequeue() call is from the task being actually dequeued or the
follow-up to the dispatch operation it just did. This won't make much
difference if ops.dequeue() is just used for accounting purposes, but, a
scheduler which uses an arena data structure for queueing would likely need
to perform extra tests to tell whether the task needs to be dequeued from
the arena side. I *think* hot path (ops.dequeue() following the task's
dispatch) can be a simple lockless test, so this may be okay, but from API
POV, it can probably be better.
The counter interlocking point is scx_bpf_dsq_insert(). If we can
synchronize scx_bpf_dsq_insert() and dequeue so that ops.dequeue() is not
called for a successfully inserted task, I think the semantics would be
neater - an enqueued task is either dispatched or dequeued. Due to the async
dispatch operation, this likely is difficult to do without adding extra sync
operations in scx_bpf_dsq_insert(). However, I *think* we may be able to get
rid of dspc and async inserting if we call ops.dispatch() w/ rq lock
dropped. That may make the whole dispatch path simpler and the behavior
neater too. What do you think?
Thanks.
--
tejun