Re: [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking
From: K Prateek Nayak
Date: Mon Mar 30 2026 - 20:44:05 EST
Hello Peter,
On 3/31/2026 12:41 AM, Peter Zijlstra wrote:
>> Turns out I spoke too soon and it did eventually run into that
>> problem again and then eventually crashed in pick_task_fair()
>> later so there is definitely something amiss still :-(
>>
>> I'll throw in some debug traces and get back tomorrow.
>
> Are there cgroups involved?
Indeed there are.
>
> I'm thinking that if you have two groups, and the tick always hits the
> one group, the other group can go a while without ever getting updated.
Ack! That could be but I only have once cgroup on top of root cgroup as
far as cpu controllers are concerned so the sched_yield() catching up
the avg_vruntime() should have worked. Either ways, I have more data:
When I hit the overflow warning, I have:
se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640)
cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0)
cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854)
Post avg_vruntime():
se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640)
cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0)
cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854)
so running avg_vruntime() doesn't make a difference and it seems to be a
genuine case of place_entity() putting the newly woken entity pretty
far back in the timeline. (I forgot to print weights!)
Now, the funny part is, if I leave the system undisturbed, I get a few
of the above warning and nothing interesting but as soon as I do a:
grep bits /sys/kernel/debug/sched/debug
Boom! Pick fails very consistently (Because of copy-pasta this too
doesn't contain weights):
NULL Pick!
cfs_rq: zero_vruntime(89029406877992895) sum_w_vruntime(-135049248768) sum_weight(1048576)
cfs_rq->curr: entity_key(149162) vruntime(89029406878142057) deadline(89029406976268435)
queued se: entity_key(-123294) vruntime(89029406877869601) deadline(89029406880669601)
after avg_vruntime()!
cfs_rq: zero_vruntime(89029406877868114) sum_w_vruntime(-4206886912) sum_weight(1048576)
cfs_rq->curr: entity_key(273943) vruntime(89029406878142057) deadline(89029406976268435)
queued se: entity_key(1487) vruntime(89029406877869601) deadline(89029406880669601)
NULL Pick!
The above doesn't recover after a avg_vruntime(). Btw I'm running:
nice -n 19 stress-ng --yield 32 -t 1000000s&
while true; do perf bench sched messaging -p -t -l 100000 -g 16; done
Nice 19 is to get a large deadline and keep catching up to that deadline
at every yield to see if that makes any difference.
>
> But if there's no cgroups, this can't be it.
>
> Anyway, something like the below would rule this out I suppose.
I'll add that in and see if it makes a difference. I'll add in
weights and look at place_entity() to see if we have anything
interesting going on there.
Thank you for taking a look.
--
Thanks and Regards,
Prateek