[PATCH 2/5] sched_ext: Add comments to scx_bypass() for bypass depth semantics
From: zhidao su
Date: Fri Mar 06 2026 - 09:04:35 EST
From: Su Zhidao <suzhidao@xxxxxxxxxx>
The bypass depth counter (scx_bypass_depth) uses WRITE_ONCE/READ_ONCE
to communicate that it can be observed locklessly from IRQ context, even
though modifications are serialized by bypass_lock. The existing code did
not explain this pattern or the re-queue loop's role in propagating the
bypass state change to all CPUs.
Add inline comments to clarify:
- Why bypass_depth uses WRITE_ONCE/READ_ONCE despite lock protection
- How the dequeue/enqueue cycle propagates bypass state to all per-CPU queues
Signed-off-by: Su Zhidao <suzhidao@xxxxxxxxxx>
---
kernel/sched/ext.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 56ff5874af94..053d99c58802 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4229,6 +4229,14 @@ static void scx_bypass(bool bypass)
if (bypass) {
u32 intv_us;
+ /*
+ * Increment bypass depth. Only the first caller (depth 0->1)
+ * needs to set up the bypass state; subsequent callers just
+ * increment the counter and return. The depth counter is
+ * protected by bypass_lock but READ_ONCE/WRITE_ONCE are used
+ * to communicate that the value can be observed locklessly
+ * (e.g., from scx_bypass_lb_timerfn() in softirq context).
+ */
WRITE_ONCE(scx_bypass_depth, scx_bypass_depth + 1);
WARN_ON_ONCE(scx_bypass_depth <= 0);
if (scx_bypass_depth != 1)
@@ -4263,6 +4271,10 @@ static void scx_bypass(bool bypass)
*
* This function can't trust the scheduler and thus can't use
* cpus_read_lock(). Walk all possible CPUs instead of online.
+ *
+ * The dequeue/enqueue cycle forces tasks through the updated code
+ * paths: in bypass mode, do_enqueue_task() routes to the per-CPU
+ * bypass DSQ instead of calling ops.enqueue().
*/
for_each_possible_cpu(cpu) {
struct rq *rq = cpu_rq(cpu);
--
2.43.0