Re: WARN_ON_ONCE() in process_one_work()?

From: Paul E. McKenney
Date: Tue Jul 03 2018 - 00:03:15 EST


On Mon, Jul 02, 2018 at 02:05:40PM -0700, Tejun Heo wrote:
> Hello, Paul.
>
> Sorry about the late reply.
>
> On Wed, Jun 20, 2018 at 12:29:01PM -0700, Paul E. McKenney wrote:
> > I have hit this WARN_ON_ONCE() in process_one_work:
> >
> > WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
> > raw_smp_processor_id() != pool->cpu);
> >
> > This looks like it is my rcu_gp workqueue (see splat below), and it
> > appears to be intermittent. This happens on rcutorture scenario SRCU-N,
> > which does random CPU-hotplug operations (in case that helps).
> >
> > Is this related to the recent addition of WQ_MEM_RECLAIM? Either way,
> > what should I do to further debug this?
>
> Hmm... I checked the code paths but couldn't spot anything suspicious.
> Can you please apply the following patch and see whether it triggers
> before hitting the warn and if so report what it says?

I will apply this, but be advised that I have not seen that WARN_ON_ONCE()
trigger since. :-/

Thanx, Paul

> Thanks.
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 0db8938fbb23..81caab9643b2 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -79,6 +79,15 @@ static struct lockdep_map cpuhp_state_up_map =
> static struct lockdep_map cpuhp_state_down_map =
> STATIC_LOCKDEP_MAP_INIT("cpuhp_state-down", &cpuhp_state_down_map);
>
> +int cpuhp_current_state(int cpu)
> +{
> + return per_cpu_ptr(&cpuhp_state, cpu)->state;
> +}
> +
> +int cpuhp_target_state(int cpu)
> +{
> + return per_cpu_ptr(&cpuhp_state, cpu)->target;
> +}
>
> static inline void cpuhp_lock_acquire(bool bringup)
> {
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 78b192071ef7..365cf6342808 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1712,6 +1712,9 @@ static struct worker *alloc_worker(int node)
> return worker;
> }
>
> +int cpuhp_current_state(int cpu);
> +int cpuhp_target_state(int cpu);
> +
> /**
> * worker_attach_to_pool() - attach a worker to a pool
> * @worker: worker to be attached
> @@ -1724,13 +1727,20 @@ static struct worker *alloc_worker(int node)
> static void worker_attach_to_pool(struct worker *worker,
> struct worker_pool *pool)
> {
> + int ret;
> +
> mutex_lock(&wq_pool_attach_mutex);
>
> /*
> * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
> * online CPUs. It'll be re-applied when any of the CPUs come up.
> */
> - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
> + if (ret && pool->cpu >= 0 && worker->rescue_wq)
> + printk("XXX rescuer failed to attach: ret=%d pool=%d this_cpu=%d target_cpu=%d cpuhp_state=%d chuhp_target=%d\n",
> + ret, pool->id, raw_smp_processor_id(), pool->cpu,
> + cpuhp_current_state(pool->cpu),
> + cpuhp_target_state(pool->cpu));
>
> /*
> * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
>