Re: [PATCH] sched_ext/scx_flatcg: Fix cvtime_delta race and add hweight scaling to bypass charging
From: Andrea Righi
Date: Mon Jun 08 2026 - 01:58:42 EST
Hi Wanwu,
On Mon, Jun 08, 2026 at 12:22:30PM +0800, Wanwu Li wrote:
> From: Wanwu Li <liwanwu@xxxxxxxxxx>
>
> 1. cgrp_cap_budget() used __sync_fetch_and_sub(&cgc->cvtime_delta,
> cgc->cvtime_delta) to atomically read and clear cvtime_delta. However,
> this is not a true atomic read-clear operation: the second argument
> (cgc->cvtime_delta) is evaluated as a normal read before the atomic
> fetch_and_sub executes. If a concurrent __sync_fetch_and_add() happens
> between the read and the sub, the added value gets included in the
> returned delta AND remains in cvtime_delta, causing double charging.
>
> Example:
> CPU 0 runs cgrp_cap_budget(), CPU 1 runs fcg_stopping().
> Assume cvtime_delta = 100 initially.
>
> T1 CPU 0: sub_val = cvtime_delta = 100 cvtime_delta = 100
> T2 CPU 1: __sync_fetch_and_add(&cvtime_delta, 10) cvtime_delta = 110
> T3 CPU 0: __sync_fetch_and_sub(&cvtime_delta, sub_val) cvtime_delta = 10
> returns old=110
>
> delta = 110 (includes the 10 from CPU 1), but cvtime_delta = 10
> (the 10 also remains). The 10 is charged twice: once in delta
> (applied to cgv_node->cvtime) and once in the residual cvtime_delta
> (fetched again next time).
>
> Fix by using __sync_fetch_and_and(&cgc->cvtime_delta, 0).
> Disassembly comparison:
> (1) delta = __sync_fetch_and_sub(&cgc->cvtime_delta, cgc->cvtime_delta);
> 228: (79) r7 = *(u64 *)(r9 +40)
> 229: (87) r7 = -r7
> 230: (db) r7 = atomic64_fetch_add((u64 *)(r9 +40), r7) //r9 may be changed
>
> (2) delta = __sync_fetch_and_and(&cgc->cvtime_delta, 0);
> 228: (b7) r8 = 0
> 229: (db) r8 = atomic64_xchg((u64 *)(r9 +40), r8)
>
> 2. The bypass charging path in fcg_stopping() charges raw execution time
> to cvtime_delta without scaling by the inverse of the cgroup hweight.
> Since cvtime_delta is eventually applied to cgv_node->cvtime which is
> in vtime space (weight-scaled), the bypass path should also scale by
> FCG_HWEIGHT_ONE / hweight to match the units used by the dispatch path.
>
> Signed-off-by: Wanwu Li <liwanwu@xxxxxxxxxx>
We should probably add:
Fixes: a4103eacc2ab ("sched_ext: Add a cgroup scheduler which uses flattened hierarchy")
Other than that, looks good to me:
Reviewed-by: Andrea Righi <arighi@xxxxxxxxxx>
Thanks,
-Andrea
> ---
> tools/sched_ext/scx_flatcg.bpf.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/tools/sched_ext/scx_flatcg.bpf.c b/tools/sched_ext/scx_flatcg.bpf.c
> index fec359581826..ffabf0d9f32e 100644
> --- a/tools/sched_ext/scx_flatcg.bpf.c
> +++ b/tools/sched_ext/scx_flatcg.bpf.c
> @@ -256,7 +256,7 @@ static void cgrp_cap_budget(struct cgv_node *cgv_node, struct fcg_cgrp_ctx *cgc)
> * and thus can't be updated and repositioned. Instead, we collect the
> * vtime deltas separately and apply it asynchronously here.
> */
> - delta = __sync_fetch_and_sub(&cgc->cvtime_delta, cgc->cvtime_delta);
> + delta = __sync_fetch_and_and(&cgc->cvtime_delta, 0);
> cvtime = cgv_node->cvtime + delta;
>
> /*
> @@ -570,7 +570,8 @@ void BPF_STRUCT_OPS(fcg_stopping, struct task_struct *p, bool runnable)
> cgc = find_cgrp_ctx(cgrp);
> if (cgc) {
> __sync_fetch_and_add(&cgc->cvtime_delta,
> - p->se.sum_exec_runtime - taskc->bypassed_at);
> + (p->se.sum_exec_runtime - taskc->bypassed_at) *
> + FCG_HWEIGHT_ONE / (cgc->hweight ?: 1));
> taskc->bypassed_at = 0;
> }
> bpf_cgroup_release(cgrp);
> --
> 2.25.1
>