Re: NULL pointer dereference in pick_next_task_fair

From: Valentin Schneider
Date: Wed Oct 30 2019 - 21:33:33 EST


On 30/10/2019 23:50, Ram Muthiah wrote:
>
> Quentin and I were able to create a setup which reproduces the issue.
>
> Given this, I tried Peter's proposed fix and was still able to reproduce the
> issue unfortunately. Current patch is located here -
> https://android-review.googlesource.com/c/kernel/common/+/1153487
>
> Our mitigation for this issue on the android-mainline branch has been to
> revert 67692435c411 ("sched: Rework pick_next_task() slow-path").
> https://android-review.googlesource.com/c/kernel/common/+/1152564
>
> I'll spend some time detailing repro steps next. I should be able to
> provide an update on those details early next week.
>
> We appreciate the help so far.
> Thanks,
> Ram
>

The splat Quentin posted happens at secondary startup, is that always
the case? I'm trying to think of what could make rq.cfs_rq.nr_running
non-zero at secondary bringup time. It might not explain the NULL pointer, but
I'm still curious as to how we can get something there this early, as it could
point towards something. Be warned, I might bring up stuff I know nothing
about, but this looks "fun" so I can't help myself :)


sched domains are only setup after smp_init() in sched_init_smp(), thus after
we've booted all secondaries. This should take load balance out of the
picture.

For wakeups, select_task_rq_fair() can only ever pick prev_cpu or this_cpu
since there are no sched domains. I don't see many candidates that could
wakeup on a secondary (thus have non-zero this_cpu) this early there. Perhaps
the smpboot threads, but from a quick look they are first created *after*
sched_init_smp(), so they couldn't exist during (boot-time) secondary bringup.
Seems to be the same for IRQ threads (and they're setscheduler'd to FIFO
anyway).

So now I'm even more curious as to what CFS task could be enqueued on a
secondary CPU rq before sched_init_smp(). Have you been sending stuff to space
without any shielding lately?