Re: [PATCH] sched/fair: Add null pointer check to pick_next_entity()

From: Rik van Riel
Date: Wed Apr 02 2025 - 11:00:17 EST


On Mon, 2025-03-24 at 12:56 +0100, Peter Zijlstra wrote:
> On Thu, Mar 20, 2025 at 01:53:10PM -0700, Pat Cody wrote:
> > pick_eevdf() can return null, resulting in a null pointer
> > dereference
> > crash in pick_next_entity()
>
> If it returns NULL while nr_queued, something is really badly wrong.
>
> Your check will hide this badness.

Looking at the numbers, I suspect vruntime_eligible()
is simply not allowing us to run the left-most entity
in the rb tree.

At the root level we are seeing these numbers:

*(struct cfs_rq *)0xffff8882b3b80000 = {
.load = (struct load_weight){
.weight = (unsigned long)4750106,
.inv_weight = (u32)0,
},
.nr_running = (unsigned int)3,
.h_nr_running = (unsigned int)3,
.idle_nr_running = (unsigned int)0,
.idle_h_nr_running = (unsigned int)0,
.h_nr_delayed = (unsigned int)0,
.avg_vruntime = (s64)-2206158374744070955,
.avg_load = (u64)4637,
.min_vruntime = (u64)12547674988423219,

Meanwhile, the cfs_rq->curr entity has a weight of
4699124, a vruntime of 12071905127234526, and a
vlag of -2826239998

The left node entity in the cfs_rq has a weight
of 107666, a vruntime of 16048555717648580,
and a vlag of -1338888

I cannot for the life of me figure out how the
avg_vruntime number is so out of whack from what
the vruntime numbers of the sched entities on the
runqueue look like.

The avg_vruntime code is confusing me. On the
one hand the vruntime number is multiplied by
the sched entity weight when adding to or
subtracting to avg_vruntime, but on the other
hand vruntime_eligible scales the comparison
by the cfs_rq->avg_load number.

What even protects the load number in vruntime_eligible
from going negative in certain cases, when the current
entity's entity_key is a negative value?

The latter is probably not the bug we're seeing now, but
I don't understand how that is supposed to behave.


--
All Rights Reversed.