Re: [PATCH v2] workqueue: Add pool_workqueue to pending_pwqs list when unplugging multiple inactive works

From: Waiman Long

Date: Wed Apr 01 2026 - 14:05:19 EST

On 4/1/26 11:40 AM, Matthew Brost wrote:

On Wed, Apr 01, 2026 at 10:44:55AM -0400, Waiman Long wrote:

On 3/31/26 9:07 PM, Matthew Brost wrote:

In unplug_oldest_pwq(), the first inactive work item on the
pool_workqueue is activated correctly. However, if multiple inactive
works exist on the same pool_workqueue, subsequent works fail to
activate because wq_node_nr_active.pending_pwqs is empty — the list
insertion is skipped when the pool_workqueue is plugged.

Fix this by checking for additional inactive works in
unplug_oldest_pwq() and updating wq_node_nr_active.pending_pwqs
accordingly.

v2:
- Use pwq_activate_first_inactive(pwq, false) rather than open coding
list operations (Tejun)

Cc: Carlos Santa <carlos.santa@xxxxxxxxx>
Cc: Ryan Neph <ryanneph@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx
Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>

---

This bug was first reported by Google, where the Xe driver appeared to
hang due to a fencing signal not completing. We traced the issue to work
items not being scheduled, and it can be trivially reproduced on drm-tip
with the following commands:

shell0:
for i in {1..100}; do echo "Run $i"; xe_exec_threads --r \
threads-rebind-bindexecqueue; done

shell1:
for i in {1..1000}; do echo "toggle $i"; echo f > \
/sys/devices/virtual/workqueue/cpumask; echo ff > \
/sys/devices/virtual/workqueue/cpumask; echo fff > \
/sys/devices/virtual/workqueue/cpumask ; echo ffff > \
/sys/devices/virtual/workqueue/cpumask; sleep .1; done
---
kernel/workqueue.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641..bee3f37fffde 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1849,8 +1849,17 @@ static void unplug_oldest_pwq(struct workqueue_struct *wq)
raw_spin_lock_irq(&pwq->pool->lock);
if (pwq->plugged) {
pwq->plugged = false;
- if (pwq_activate_first_inactive(pwq, true))
+ if (pwq_activate_first_inactive(pwq, true)) {
+ /*
+ * pwq is unbound. Additional inactive work_items need
+ * to reinsert the pwq into nna->pending_pwqs, which
+ * was skipped while pwq->plugged was true. See
+ * pwq_tryinc_nr_active() for additional details.
+ */
+ pwq_activate_first_inactive(pwq, false);
+
kick_pool(pwq->pool);
+ }
}
raw_spin_unlock_irq(&pwq->pool->lock);
}

Thanks for fixing this bug. However, calling pwq_activate_first_inactive

No problem — I think this one has been lurking around for a while, and
we’ve just papered over it in Xe for a couple of years.

twice can be a bit hard to understand. Will modifying pwq_tryinc_nr_active()

I actually think it makes quite a bit of sense, as it matches what
__queue_work does if two items are added back-to-back on an ordered
workqueue — the first one updates the nr_active counts and activates,
and the second one updates the pending_pwqs.

This patch works because only an ordered workqueue with a max_active of 1 can be plugged. Perhaps you should put the note above into the comment too.

like the following works?

My initial thought was that your snippet should work — in fact, it does
for a while (drm-tip hangs almost immediately), but eventually I do get
a hang when running my reproducer, whereas with this patch I don’t. I
can’t reason exactly why — maybe it’s because
node_activate_pending_pwq() can find a plugged pwq, but that’s just a
guess.

That may be the case. Thanks for checking it anyway.

Acked-by: Waiman Long <longman@xxxxxxxxxx>

Matt

Thanks,
Longman

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b77119d71641..b35e6e62e474 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1738,9 +1738,6 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
goto out;
}
- if (unlikely(pwq->plugged))
- return false;
-
/*
* Unbound workqueue uses per-node shared nr_active $nna. If @pwq is
* already waiting on $nna, pwq_dec_nr_active() will maintain the
@@ -1749,13 +1746,19 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
* We need to ignore the pending test after max_active has increased as
* pwq_dec_nr_active() can only maintain the concurrency level but not
* increase it. This is indicated by @fill.
+ *
+ * If @pwq is plugged, we need to make sure that it is linked to a
+ * pending_pwqs of a $nna.
+ *
*/
- if (!list_empty(&pwq->pending_node) && likely(!fill))
+ if (!list_empty(&pwq->pending_node) && likely(!fill || pwq->plugged))
goto out;
- obtained = tryinc_node_nr_active(nna);
- if (obtained)
- goto out;
+ if (likely(!pwq->plugged)) {
+ obtained = tryinc_node_nr_active(nna);
+ if (obtained)
+ goto out;
+ }
/*
* Lockless acquisition failed. Lock, add ourself to $nna->pending_pwqs
@@ -1773,7 +1776,8 @@ static bool pwq_tryinc_nr_active(struct pool_workqueue *pwq, bool fill)
smp_mb();
- obtained = tryinc_node_nr_active(nna);
+ if (likely(!pwq->plugged))
+ obtained = tryinc_node_nr_active(nna);
/*
* If @fill, @pwq might have already been pending. Being spuriously