Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
From: Zhan Xusheng
Date: Wed Apr 29 2026 - 03:35:59 EST
Hi Prateek, Hi Peter,
On Wed, 29 Apr 2026, K Prateek Nayak wrote:
> ... the worst case right side would be:
> 57671680000000 * load /* load = ((1 << 28) + 2) */
> and yes that goes beyond 64-bits at 15481123719086080000000.
> ...
> cfs_rq[0]:/
> .sum_w_vruntime : 109061637539400 (46 bits)
> .sum_weight : 14
> .sum_shift : 0
> ...
> without any splat so I'm not sure if there is something that prevents
> a possible crash since a weight of 104857600 should have definitely
> made that entity_key() overflow.
Thanks for running this.
On root's sum_weight being 14 despite cg1 having se->load.weight =
104857600: sum_weight is incremented by __enqueue_entity() only for
entities currently on_rq on this cfs_rq. In your snapshot only cg0
(weight 14) is on_rq on root; cg1's group_se is not, so it does not
contribute to root's sum_weight -- even though its se->load.weight
is 104857600.
If a workload kept both group_ses on_rq on root simultaneously,
root's sum_weight would be 14 + 104857600. I have not constructed
a workload that reliably holds both on_rq long enough to observe a
sum_shift bump; that is outside what I can test in my current
environment.
On the upper-bound analysis, I re-checked both of your derivations:
- Peter's (slice + TICK_NSEC) * NICE_0_LOAD * NICE_0_LOAD * 100
evaluates to ~1.87e20 with HZ=1000 and the 64-bit NICE_0_LOAD
shift of 20, using 100 * NICE_0_LOAD as the heavy-side sum_weight.
- Your refinement uses load = (1 << 28) + 2, giving ~1.55e22.
(1 << 28) corresponds to scale_load(MAX_SHARES) and is reachable
via sched_setattr / non-cgroup paths, but is not reachable
through cpu.weight alone, which caps load.weight at ~1.05e8 via
sched_weight_from_cgroup(CGROUP_WEIGHT_MAX).
Both bounds cross S64_MAX. Working backwards from the smaller
bound: with Peter's puny.key ~1.78e12, the multiplication crosses
S64_MAX once load exceeds ~5.2e6, which a single cpu.weight=10000
cgroup clears on its own (load.weight ~1.05e8). So in cgroup-only
setups the overflow threshold on load is low; the reason your
snapshot stayed well under it appears to be the on_rq pattern, not
the weight cap.
To be upfront about the patch's basis:
- I do not have a reproducer.
- I have not observed the WARN_ON() in __enqueue_entity().
- The patch came from static inspection of vruntime_eligible()
after 556146ce5e94. The observation is that key * load
overflows specifically when *both* key is large (the entity
being checked is far from v0) and load is large (an unrelated
heavy entity is on_rq). The v0-move change keeps individual
(v_i - v0) * w_i terms in sum_w_vruntime bounded, but key and
load in vruntime_eligible() refer to different entities, so v0
placement cannot simultaneously control both factors.
Given the bounds above, I think the defensive check is justified
as hardening against a provable overflow, with the acknowledgement
that I have not triggered it in practice. I'm happy to respin with
a commit message built on these analyses if that would help.
Thanks,
Zhan Xusheng