Re: NULL pointer dereference in pick_next_task_fair

From: Peter Zijlstra
Date: Thu Nov 07 2019 - 13:44:29 EST


On Thu, Nov 07, 2019 at 03:38:48PM +0000, Quentin Perret wrote:
> On Thursday 07 Nov 2019 at 14:26:28 (+0100), Peter Zijlstra wrote:
> > Given that we're stuck with this order, the only solution is fixing
> > the 'change' pattern. The simplest fix seems to be to 'absuse'
> > p->on_cpu to carry more state. Adding more state to p->on_rq is
> > possible but is far more invasive and also ends up duplicating much of
> > the state we already carry in p->on_cpu.
>
> I think there is another solution, which is to 'de-factorize' the call
> to put_prev_task() (that is, have each class do it). I gave it a go and
> I basically end up with something equivalent to reverting 67692435c411
> ("sched: Rework pick_next_task() slow-path"), which isn't the worst
> solution IMO. I'm thinking at least we should consider it.

The purpose of 67692435c411 is to ret rid of the RETRY_TASK logic
restarting the pick.

But you mean something like:

for (class = prev->sched_class; class; class = class->next) {
if (class->balance(rq, rf))
break;
}

put_prev_task(rq, prev);

for_each_class(class) {
p = class->pick_next_task(rq);
if (p)
return p;
}

BUG();

like?

I had convinced myself we didn't need that, but that DL to RT case is
pesky and might require it after all.

> Now, 67692435c411 _is_ a nice clean-up, it's just a shame that the fix
> on top isn't as nice (IMO). It might just be a matter of personal taste,
> so I don't have a strong opinion on this :)

Yeah, it does rather make a mess of things.

I'll try and code up the above after dinner.