[tip: sched/core] sched/fair: Fix overflow in update_tg_cfs_runnable()
From: tip-bot2 for Chen, Yu C
Date: Tue Jun 30 2026 - 05:08:02 EST
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 4f166adb5cb0525d9e32d45729fd8f28c80acbee
Gitweb: https://git.kernel.org/tip/4f166adb5cb0525d9e32d45729fd8f28c80acbee
Author: Chen, Yu C <yu.c.chen@xxxxxxxxx>
AuthorDate: Sat, 20 Jun 2026 11:54:22 +08:00
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Tue, 30 Jun 2026 10:56:51 +02:00
sched/fair: Fix overflow in update_tg_cfs_runnable()
A divide-by-zero crash is observed when running hackbench:
[14697.488452] CPU: 112 UID: 0 PID: 124791 Comm: hackbench Not tainted 7.1.0-rc2+
[14697.492627] RIP: 0010:propagate_entity_load_avg+0x35f/0x3e0
[14697.506799] <TASK>
[14697.507411] __dequeue_task+0x2b4/0xc70
[14697.508677] dequeue_task_fair+0x36/0x370
[14697.509047] dequeue_task+0x101/0x2f0
[14697.509426] __schedule+0x1b1/0x1a00
[14697.510868] anon_pipe_read+0x3da/0x450
[14697.511400] vfs_read+0x361/0x390
[14697.512053] __x64_sys_read+0x19/0x30
The divide-by-zero happens here:
if (scale_load_down(gcfs_rq->load.weight)) {
load_sum = div_u64(gcfs_rq->avg.load_sum,
scale_load_down(gcfs_rq->load.weight));
}
gcfs_rq->load.weight is an insane large value and is truncated
to the lower 32 bits by div_u64, which happen to be 0.
Using AI for investigation, the cause is a u32 overflow in
update_tg_cfs_runnable(), and flat pickup became a victim when using
tg_tasks():
u32 new_sum, divider;
...
new_sum = se->avg.runnable_avg * divider; <-- boom
The following sequence shows how this triggers the crash:
propagate_entity_load_avg()
update_tg_cfs_runnable() # u32 overflow corrupts runnable_sum
__update_load_avg_cfs_rq()
___update_load_avg() # computes insane runnable_avg
update_tg_load_avg() # propagates to tg->runnable_avg
update_cfs_group()
calc_concur_shares()
tg_tasks() # long-to-int truncation, negative nr
reweight_entity() # corrupted se->load.weight
update_load_add() # corrupted cfs_rq->load.weight
propagate_entity_load_avg()
update_tg_cfs_load()
div_u64() # divide-by-zero
Fix by widening new_sum from u32 to u64 (no need to force tg_tasks()
to return unsigned long after this fix)
Fixes: 95246d1ec80b ("sched/pelt: Relax the sync of runnable_sum with runnable_avg")
Assisted-by: Claude:claude-opus-4.6
Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Link: https://patch.msgid.link/a22eea2b-4c4a-4623-9a44-d7b18c0c91c8@xxxxxxxxx
---
kernel/sched/fair.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a384f16..e89edbd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5174,7 +5174,8 @@ static inline void
update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq *gcfs_rq)
{
long delta_sum, delta_avg = gcfs_rq->avg.runnable_avg - se->avg.runnable_avg;
- u32 new_sum, divider;
+ u64 new_sum;
+ u32 divider;
/* Nothing to update */
if (!delta_avg)
@@ -5188,7 +5189,7 @@ update_tg_cfs_runnable(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cf
/* Set new sched_entity's runnable */
se->avg.runnable_avg = gcfs_rq->avg.runnable_avg;
- new_sum = se->avg.runnable_avg * divider;
+ new_sum = (u64)se->avg.runnable_avg * divider;
delta_sum = (long)new_sum - (long)se->avg.runnable_sum;
se->avg.runnable_sum = new_sum;