Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition
From: Tejun Heo
Date: Thu Apr 09 2026 - 13:47:18 EST
On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > non-preemptible") defers srcu_node tree allocation when called under
> > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > per-CPU pools directly - pools for not-online CPUs have no workers,
> >
> > [Cc workqueue]
> >
> > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > unbound one hence there are still workers?
> >
>
> Ah, as Paul replied in another email, the problem was because these CPUs
> had never been onlined, so they don't even have unbound workers?
Hahaha, we do initialize worker pool for every possible CPU but the
transition to unbound operation happens in the hot unplug callback. We
probably need to do some of the hot unplug operation during init if the CPU
is possible but not online. That said, what kind of machine is it? Is the
firmware just reporting bogus possible mask? How come the CPUs weren't
online during boot?
Thanks.
--
tejun