Re: Oops on Power8 (was Re: [PATCH v2 1/7] workqueue: make workqueue available early during boot)

From: Balbir Singh
Date: Fri Oct 14 2016 - 21:27:46 EST




On 15/10/16 02:07, Tejun Heo wrote:
> Hello, Michael.
>
> On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote:
>> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq
>> is NULL.
>>
>> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is
>> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in
>> set_task_rq() and happen to get NULL.
>>
>> We never should have done set_task_rq(p, 2048), because 2048 is >=
>> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq()
>> doesn't cope with that.
>
> Hmm... it doesn't reproduce it here and can't see how the commit would
> affect this given that it doesn't really change when the kworker
> kthreads are being created.
>
>> Presumably we shouldn't be ending up with tsk_cpus_allowed() being
>> empty, but I haven't had time to track down why that's happening.
>

I think the basic analysis shows the change to creation of unbounded
workqueues from the unbound_hash, but those have a pool cpumask empty.

> Can you please add WARN_ON_ONCE(!tsk_nr_cpus_allowed(p)) to
> select_task_rq() and post what that says?
>
> Thanks.
>

Balbir Singh.