Re: WARN_ON_ONCE() in process_one_work()?

From: Tejun Heo
Date: Mon May 01 2017 - 14:42:57 EST


Hello, Paul.

Hmmm... Steven reproted a similar issue.

http://lkml.kernel.org/r/20170405151628.33df783f@xxxxxxxxxxxxxxxxxx

On Mon, May 01, 2017 at 09:57:47AM -0700, Paul E. McKenney wrote:
> Hello!
>
> I am hitting this WARN_ON_ONCE() in process_one_work() and am wondering
> what I did wrong to make this happen:
>
> ------------------------------------------------------------------------
>
> static void process_one_work(struct worker *worker, struct work_struct *work)
> __releases(&pool->lock)
> __acquires(&pool->lock)
> {
> struct pool_workqueue *pwq = get_work_pwq(work);
> struct worker_pool *pool = worker->pool;
> bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
> int work_color;
> struct worker *collision;
> #ifdef CONFIG_LOCKDEP
> /*
> * It is permissible to free the struct work_struct from
> * inside the function that is called from it, this we need to
> * take into account for lockdep too. To avoid bogus "held
> * lock freed" warnings as well as problems when looking into
> * work->lockdep_map, make a copy and use that here.
> */
> struct lockdep_map lockdep_map;
>
> lockdep_copy_map(&lockdep_map, &work->lockdep_map);
> #endif
> /* ensure we're on the correct CPU */
> WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> raw_smp_processor_id() != pool->cpu);
>
> ------------------------------------------------------------------------
>
> Here is the splat:
>
> ------------------------------------------------------------------------
>
> [12600.593006] WARNING: CPU: 0 PID: 6 at /home/paulmck/public_git/linux-rcu/kernel/workqueue.c:2041 process_one_work+0x46c/0x4d0
> [12600.593006] Modules linked in:
> [12600.593006] CPU: 0 PID: 6 Comm: mm_percpu_wq Not tainted 4.11.0-rc7+ #1
> [12600.593006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [12600.593006] Call Trace:
> [12600.593006] dump_stack+0x4f/0x72
> [12600.593006] __warn+0xc6/0xe0
> [12600.593006] warn_slowpath_null+0x18/0x20
> [12600.593006] process_one_work+0x46c/0x4d0
> [12600.593006] rescuer_thread+0x20e/0x3b0
> [12600.593006] kthread+0x104/0x140
> [12600.593006] ? worker_thread+0x4e0/0x4e0
> [12600.593006] ? kthread_create_on_node+0x40/0x40
> [12600.593006] ret_from_fork+0x29/0x40
>
> ------------------------------------------------------------------------
>
> This happens about 3.5 hours into the TREE03 rcutorture scenario, .config
> attached.

Steven's involved a rescuer too. One possibility was cpuset being
involved somehow and messing up the affinity of the rescuer kthread
unexpectedly. Is cpuset involved in any way?

Thanks.

--
tejun