Re: BUG: workqueue lockup - SRCU schedules work on not-online CPUs during size transition

From: Paul E. McKenney

Date: Fri Apr 10 2026 - 15:17:27 EST

On Fri, Apr 10, 2026 at 08:53:30AM -1000, Tejun Heo wrote:
> Hello,
>
> On Thu, Apr 09, 2026 at 11:10:04AM -0700, Boqun Feng wrote:
> > On Thu, Apr 09, 2026 at 07:47:09AM -1000, Tejun Heo wrote:
> > > On Thu, Apr 09, 2026 at 10:40:05AM -0700, Boqun Feng wrote:
> > > > On Thu, Apr 09, 2026 at 10:26:49AM -0700, Boqun Feng wrote:
> > > > > On Thu, Apr 09, 2026 at 03:08:45PM +0200, Vasily Gorbik wrote:
> > > > > > Commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> > > > > > non-preemptible") defers srcu_node tree allocation when called under
> > > > > > raw spinlock, putting SRCU through ~6 transitional grace periods
> > > > > > (SRCU_SIZE_ALLOC to SRCU_SIZE_BIG). During this transition srcu_gp_end()
> > > > > > uses mask = ~0, which makes srcu_schedule_cbs_snp() call queue_work_on()
> > > > > > for every possible CPU. Since rcu_gp_wq is WQ_PERCPU, work targets
> > > > > > per-CPU pools directly - pools for not-online CPUs have no workers,
> > > > >
> > > > > [Cc workqueue]
> > > > >
> > > > > Hmm.. I thought for offline CPUs the corresponding worker pools become a
> > > > > unbound one hence there are still workers?
> > > > >
> > > >
> > > > Ah, as Paul replied in another email, the problem was because these CPUs
> > > > had never been onlined, so they don't even have unbound workers?
> > >
> > > Hahaha, we do initialize worker pool for every possible CPU but the
> > > transition to unbound operation happens in the hot unplug callback. We
> >
> > ;-) ;-) ;-)
> >
> > > probably need to do some of the hot unplug operation during init if the CPU
> >
> > Seems that we (mostly Paul) have our own trick to track whether a CPU
> > has ever been onlined in RCU, see rcu_cpu_beenfullyonline(). Paul also
> > used it in his fix [1]. And I think it won't be that hard to copy it
> > into workqueue and let queue_work_on() use it so that if the user queues
> > a work on a never-onlined CPU, it can detect it (with a warning?) and do
> > something?
>
> The easiest way to do this is just creating the initial workers for all
> possible pools. Please see below. However, the downside is that it's going
> to create all workers for all possible cpus. This isn't a problem for
> anybody else but these IBM mainframes often come up with a lot of possible
> but not-yet-or-ever-online CPUs for capacity management, so the cost may not
> be negligible on some configurations.
>
> IBM folks, is that okay?

I have also seen x86 systems whose firmware claimed very large numbers
of CPUs. :-(

> Also, why do you need to queue work items on an offline CPU? Do they
> actually have to be per-cpu? Can you get away with using an unbound
> workqueue?

It is good for them to run on the specified CPU in the common case for
cache-locality reasons, but if they were occasionally redirected to some
other CPU, that would be just fine.

I am also keeping the patch that avoids queueing work to CPUs that are not
yet fully online. Further adjustments will be needed if someone invokes
call_srcu(), synchronize_srcu(), or synchronize_srcu_expedited() from an
CPU that is not yet fully online. Past experience of course suggests that
this will be happen, and that there will be a good reason for it. ;-)

Thanx, Paul

> Thanks.
>
> From: Tejun Heo <tj@xxxxxxxxxx>
> Subject: workqueue: Create workers for all possible CPUs on init
>
> Per-CPU worker pools are initialized for every possible CPU during early boot,
> but workqueue_init() only creates initial workers for online CPUs. On systems
> where possible CPUs outnumber online CPUs (e.g. s390 LPARs with 76 online and
> 400 possible CPUs), the pools for never-onlined CPUs have POOL_DISASSOCIATED
> set but no workers. Any work item queued on such a CPU hangs indefinitely.
>
> This was exposed by 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when
> non-preemptible") which made SRCU schedule callbacks on all possible CPUs
> during size transitions, triggering workqueue lockup warnings for all
> never-onlined CPUs.
>
> Create workers for all possible CPUs during init, not just online ones. For
> online CPUs, the behavior is unchanged - POOL_DISASSOCIATED is cleared and the
> worker is bound to the CPU. For not-yet-online CPUs, POOL_DISASSOCIATED
> remains set, so worker_attach_to_pool() marks the worker UNBOUND and it can
> execute on any CPU. When the CPU later comes online, rebind_workers() handles
> the transition to associated operation as usual.
>
> Reported-by: Vasily Gorbik <gor@xxxxxxxxxxxxx>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Boqun Feng <boqun@xxxxxxxxxx>
> Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> ---
> kernel/workqueue.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -8068,9 +8068,10 @@ void __init workqueue_init(void)
> for_each_bh_worker_pool(pool, cpu)
> BUG_ON(!create_worker(pool));
>
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> for_each_cpu_worker_pool(pool, cpu) {
> - pool->flags &= ~POOL_DISASSOCIATED;
> + if (cpu_online(cpu))
> + pool->flags &= ~POOL_DISASSOCIATED;
> BUG_ON(!create_worker(pool));
> }
> }
> --
> tejun