Re: [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking

From: K Prateek Nayak

Date: Mon Mar 30 2026 - 20:44:05 EST


Hello Peter,

On 3/31/2026 12:41 AM, Peter Zijlstra wrote:
>> Turns out I spoke too soon and it did eventually run into that
>> problem again and then eventually crashed in pick_task_fair()
>> later so there is definitely something amiss still :-(
>>
>> I'll throw in some debug traces and get back tomorrow.
>
> Are there cgroups involved?

Indeed there are.

>
> I'm thinking that if you have two groups, and the tick always hits the
> one group, the other group can go a while without ever getting updated.

Ack! That could be but I only have once cgroup on top of root cgroup as
far as cpu controllers are concerned so the sched_yield() catching up
the avg_vruntime() should have worked. Either ways, I have more data:

When I hit the overflow warning, I have:

se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640)
cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0)
cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854)
Post avg_vruntime():
se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640)
cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0)
cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854)

so running avg_vruntime() doesn't make a difference and it seems to be a
genuine case of place_entity() putting the newly woken entity pretty
far back in the timeline. (I forgot to print weights!)

Now, the funny part is, if I leave the system undisturbed, I get a few
of the above warning and nothing interesting but as soon as I do a:

grep bits /sys/kernel/debug/sched/debug

Boom! Pick fails very consistently (Because of copy-pasta this too
doesn't contain weights):

NULL Pick!
cfs_rq: zero_vruntime(89029406877992895) sum_w_vruntime(-135049248768) sum_weight(1048576)
cfs_rq->curr: entity_key(149162) vruntime(89029406878142057) deadline(89029406976268435)
queued se: entity_key(-123294) vruntime(89029406877869601) deadline(89029406880669601)

after avg_vruntime()!
cfs_rq: zero_vruntime(89029406877868114) sum_w_vruntime(-4206886912) sum_weight(1048576)
cfs_rq->curr: entity_key(273943) vruntime(89029406878142057) deadline(89029406976268435)
queued se: entity_key(1487) vruntime(89029406877869601) deadline(89029406880669601)

NULL Pick!

The above doesn't recover after a avg_vruntime(). Btw I'm running:

nice -n 19 stress-ng --yield 32 -t 1000000s&
while true; do perf bench sched messaging -p -t -l 100000 -g 16; done

Nice 19 is to get a large deadline and keep catching up to that deadline
at every yield to see if that makes any difference.

>
> But if there's no cgroups, this can't be it.
>
> Anyway, something like the below would rule this out I suppose.

I'll add that in and see if it makes a difference. I'll add in
weights and look at place_entity() to see if we have anything
interesting going on there.

Thank you for taking a look.

--
Thanks and Regards,
Prateek