Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)

From: Tejun Heo
Date: Fri Oct 14 2016 - 11:08:12 EST


Hello, Michael.

On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote:
> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq
> is NULL.
>
> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is
> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in
> set_task_rq() and happen to get NULL.
>
> We never should have done set_task_rq(p, 2048), because 2048 is >=
> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq()
> doesn't cope with that.

Hmm... it doesn't reproduce it here and can't see how the commit would
affect this given that it doesn't really change when the kworker
kthreads are being created.

> Presumably we shouldn't be ending up with tsk_cpus_allowed() being
> empty, but I haven't had time to track down why that's happening.

Can you please add WARN_ON_ONCE(!tsk_nr_cpus_allowed(p)) to
select_task_rq() and post what that says?

Thanks.

--
tejun