Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
From: K Prateek Nayak
Date: Wed Apr 29 2026 - 01:04:10 EST
Hello Peter,
On 4/28/2026 11:05 PM, Peter Zijlstra wrote:
> On Tue, Apr 28, 2026 at 09:47:11PM +0530, K Prateek Nayak wrote:
>> (+ scheduler folks)
>>
>> Hello Zhan,
>>
>> On 4/28/2026 8:19 PM, Zhan Xusheng wrote:
>>> After commit 556146ce5e94 ("sched/fair: Avoid overflow in
>>> enqueue_entity()"), place_entity() can shift cfs_rq->zero_vruntime
>>> towards a newly enqueued heavy entity. This can make (vruntime -
>>> zero_vruntime) very large for other entities and cause key * load in
>>> vruntime_eligible() to overflow s64, flipping the eligibility result.
>>
>> So the commit in question moves the zero_vruntime only when the
>> load > sum_weight.
>>
>> You seem to have found a case where the entity_key() is already large
>> enough that moving the zero_vruntime farther will make the eligibility
>> check overflow which we were hoping will not be the case.
>>
>> Do you have a reproducer that fails pick_eevdf() after introduction of
>> commit 556146ce5e94? Also, do you see any splats in the dmesg since we
>> have a defensive WARN_ON() to catch an overflow.
>
> Right, either a reproducer or a trace showing the values leading up to
> this are the absolute minimum you need to provide. That is, you need a
> definite description of how you got there, otherwise you cannot judge
> the solution is sound.
>
> Anyway... let me construct a worst case.
>
> So if you have this cgroup crap on, then you can have an entity of
> weight 2, and vlag should then be bounded by: (slice+TICK_NSEC) *
> NICE_0_LOAD, which is around 44 bits as per the comment on entity_key().
>
> The other end is 100*NICE_0_LOAD, so lets wake that, then you get:
>
> {key, weight}[] := {
> puny: { (slice + TICK_NSEC) * NICE_0_LOAD, 2 },
> max: { 0, 100*NICE_0_LOAD },
> }
So MAX_SHARES is (1UL << 18) and we then scale it up so the weight can
can go up to (1 << 28) I suppose in UP setting.
So let us assume there is a single task with the largest custom slice on
each cgroup making the min_slice equal to 100ms. Things start off in
such a way that puny having a vruntime of say -1 is picked over max which
had a vruntime of 0 for the sake of simplicity.
Deadline then becomes:
puny->deadline = -1 + __calc_delta(100ms, NICE_0_LOAD, 2)
which is 52428800000000 - 1.
At 10HZ tick + RUN_TO_PARITY, we get 10ms error so that puts the
entity_key() at 57671680000000 (46 bits)
>
> The avg_vruntime() would end up being very close to 0 (which is
> zero_vruntime), so no real help making that more accurate.
>
> vruntime_eligible(puny) ends up with:
>
> avg = 2 * puny.key (+ 0)
> weight = 2 + 100 * NICE_0_LOAD
>
> avg >= puny.key * weight
>
> And that is: (slice + TICK_NSEC) * NICE_0_LOAD * NICE_0_LOAD * 100
> > and yes, I suppose that can exceed 64bit :-(
Then the worst case right side would be:
57671680000000 * load /* load = ((1 << 28) + 2) */
and yes that goes beyond 64-bits at 15481123719086080000000.
I'm running with this configurations:
echo 2 > /sys/fs/cgroup/cg0/cpu.weight
echo 10000 > /sys/fs/cgroup/cg1/cpu.weight
Weight of 2 actually gets scaled up to 20480 for UP scenario so I go try
to make it even smaller with SMP and wait until it gets picked again.
So it goes from ...
cfs_rq[0]:/cg0
.se->vruntime : 11602839.678164
.se->deadline : 19092472.390575
.se->sum_exec_runtime : 4243.207300
.se->slice : 100.000000
.se->custom_slice : 1
.se->load.weight : 14
.se->avg.load_avg : 2
.se->avg.util_avg : 0
.se->avg.runnable_avg : 1024
cfs_rq[0]:/cg1
.se->vruntime : 11602835.026387
.se->deadline : 11602835.706227
.se->sum_exec_runtime : 2646719.706164
.se->slice : 100.000000
.se->custom_slice : 1
.se->load.weight : 104857600
.se->avg.load_avg : 102389
.se->avg.util_avg : 1024
.se->avg.runnable_avg : 1024
cfs_rq[0]:/
.left_deadline : 19092472.390575
.left_vruntime : 11602839.678164
.zero_vruntime : 11602835.026387
.sum_w_vruntime : 65124878 (25 bits)
.sum_weight : 14
.sum_shift : 0
.avg_vruntime : 11602835.026387
.right_vruntime : 11602839.678164
.spread : 0.000000
... to ...
cfs_rq[0]:/cg0
.se->vruntime : 19392969.031442
.se->deadline : 26881950.428361
.se->sum_exec_runtime : 4347.216748
.se->slice : 100.000000
.se->custom_slice : 1
.se->load.weight : 14
.se->avg.load_avg : 1
.se->avg.util_avg : 904
.se->avg.runnable_avg : 1024
cfs_rq[0]:/cg1
.se->vruntime : 11602851.024250
.se->deadline : 11602851.144214
.se->sum_exec_runtime : 2648319.534241
.se->slice : 100.000000
.se->custom_slice : 1
.se->load.weight : 104857600
.se->avg.load_avg : 102386
.se->avg.util_avg : 1024
.se->avg.runnable_avg : 1024
cfs_rq[0]:/
.left_deadline : 26881950.428361
.left_vruntime : 19392969.031442
.zero_vruntime : 11602852.064342
.sum_w_vruntime : 109061637539400 (46 bits)
.sum_weight : 14
.sum_shift : 0
.avg_vruntime : 11602852.064342
.right_vruntime : 19392969.031442
.spread : 0.000000
without any splat so I'm not sure if there is something that prevents
a possible crash since a weight of 104857600 should have definitely
made that entity_key() overflow.
>
> Can someone double check that? I have a silly head-ache and could easily
> have made mistake. Not sure what the best solution is either. I think we
> can show avg cannot overflow, and if this multiplication overflows, then
> it must be larger, so perhaps the proposed patch is the best option.
>
> Dunno, need clear head :/
--
Thanks and Regards,
Prateek