[PATCH] [QUESTION] sched/fair: Potential vruntime underflow and unconstrained vlag scaling in rescale_entity()
From: Chen Jinghuang
Date: Thu May 14 2026 - 09:51:51 EST
Hi all,
While analyzing cgroup weight adjustment scenarios in EEVDF, I observed a
potential vruntime underflow issue caused by unconstrained vlag scaling in
rescale_entity(). I would like to consult the community on whether this
behavior is expected or if it represents a bug in the current implementation.
I notice this my trace in a multi-level cgroup environment:
CPU 3
CURRENT: PID: 12485 TASK: ffff003027f49440 COMMAND: "cpu_sim"
ROOT_TASK_GROUP: ffffd714095439c0 CFS_RQ: ffff002fbfa3d140
TASK_GROUP: ffff00211fdfe800 CFS_RQ: ffff00213f2c9800 <throttle_test>
TASK_GROUP: ffff20300f2e7000 CFS_RQ: ffff0021190ca000 <case_cpu_idle>
TASK_GROUP: ffff203016880c00 CFS_RQ: ffff00211ece4c00 <child_1>
[120] PID: 12485 TASK: ffff003027f49440 COMMAND: "cpu_sim" [CURRENT]
TASK_GROUP: ffff203016884000 CFS_RQ: ffff002156835c00 <child_2>
[120] PID: 12649 TASK: ffff003027f4e540 COMMAND: "cpu_sim"
Trace Metrics (Before/After rescale_entity and update_load_set):
Before: weight: 209715, avruntime: 7738112562, vlag: 3638691801, vruntime: 4099420761
After: weight: 614, avruntime: 7738112562, vlag: 1242814741118, vruntime: 4099420761, limit: 724001488558
(vruntime/avruntime stay unchanged; the scaling only touches vlag, deadline, and vprot)
Weight drop (209715->614) during __sched_group_set_shares() causes se->vlag
to explode in rescale_entity(), surged from 3638691801 to 1242814741118.
When the entity's vruntime is subsequently updated via se->vruntime =
avruntime - se->vlag, the massive vlag value leads to a underflow of
se->vruntime.
Furthermore, I noticed that while entity_lag() typically applies a limit (calculated
as 724001488558 in this instance) to constrain se->vlag, rescale_entity() performs
the scaling without any such boundary checks. This allows se->vlag to exceed the
theoretical limits expected by the EEVDF algorithm.
Questions:
1. Is this se->vruntime underflow during drastic weight reduction considered acceptable
within the current EEVDF design?
2. Should rescale_entity() apply a limit check (similar to entity_lag()) immediately
after scaling the vlag to prevent it from escaping reasonable bounds?
Something like this?
Signed-off-by: Chen Jinghuang <chenjinghuang2@xxxxxxxxxx>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ebec186f982..351e2f7b4b28 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4046,6 +4046,15 @@ rescale_entity(struct sched_entity *se, unsigned long weight, bool rel_vprot)
*/
se->vlag = div64_long(se->vlag * old_weight, weight);
+ {
+ u64 max_slice = cfs_rq_max_slice(cfs_rq_of(se)) + TICK_NSEC;
+ s64 limit;
+
+ limit = calc_delta_fair(max_slice, se);
+
+ se->vlag = clamp(se->vlag, -limit, limit);
+ }
+
/*
* DEADLINE
* --------
--
2.34.1