Re: [PATCH 1/4] workqueue: Reap workers via kthread_stop() and remove detach_completion

From: Lai Jiangshan
Date: Tue Sep 10 2024 - 23:23:33 EST


Hello, Marc

On Wed, Sep 11, 2024 at 12:29 AM Marc Hartmayer <mhartmay@xxxxxxxxxxxxx>
> Code starting with the faulting instruction
> ===========================================
> 000002d8c205ef20: a7180000 lhi %r1,0
> #000002d8c205ef24: 582083ac l %r2,940(%r8)
> >000002d8c205ef28: ba12a000 cs %r1,%r2,0(%r10)
> 000002d8c205ef2c: a77400cf brc 7,000002d8c205f0ca
> 000002d8c205ef30: 5800b078 l %r0,120(%r11)
> 000002d8c205ef34: a7010002 tmll %r0,2
> 000002d8c205ef38: a77400d4 brc 7,000002d8c205f0e0
> [ 14.271766] Call Trace:
> [ 14.271769] worker_thread (./arch/s390/include/asm/atomic_ops.h:198 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
> [ 14.271774] worker_thread (./arch/s390/include/asm/lowcore.h:226 ./arch/s390/include/asm/spinlock.h:61 ./arch/s390/include/asm/spinlock.h:66 ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:120 kernel/workqueue.c:3346)
> [ 14.271777] kthread (kernel/kthread.c:389)
> [ 14.271781] __ret_from_fork (arch/s390/kernel/process.c:62)
> [ 14.271784] ret_from_fork (arch/s390/kernel/entry.S:309)
> [ 14.271806] Last Breaking-Event-Address:
> [ 14.271807] mutex_unlock (kernel/locking/mutex.c:549)
>
> So it seems to me that `worker->pool` is NULL in the
> `workqueue.c:worker_thread` function and this leads to the crash.
>

I'm not familiar with s390 asm code, but it might be the case that
`worker->pool` is NULL in the in worker_thread() since detach_worker()
resets worker->pool to NULL.

If it is the case, READ_ONCE(worker->pool) should be used in worker_thread()
to fix the problem.

(It is weird to me if worker->pool is read multi-time in worker_thread()
since it is used many times, but since READ_ONCE() is not used, it can
be possible).

Thanks
Lai