[RFC 2/2] workqueue: Fix work re-entrance when requeue to a different workqueue

From: Boqun Feng
Date: Fri Oct 08 2021 - 06:06:46 EST


When requeuing a work to a different workqueue while it's still getting
processed, re-entrace as the follow can happen:

{ both WQ1 and WQ2 are bounded workqueue, and a work W has been
queued on CPU0 for WQ1}

CPU 0 CPU 1
===== ====
<In worker on CPU 0>
process_one_work():
...
// pick up W
worker->current_work = W;
worker->current_func = W->func;
...
set_work_pool_and_clear_pending(...);
// W can be requeued afterwards
queue_work_on(1, WQ2, W):
if (!test_and_set_bit(...)) {
// this branch is taken, as CPU 0
// just clears pending bit.
__queue_work(...):
pwq = <pool for CPU1 of WQ2>;
last_pool = <pool for CPU 0 of WQ1>;
if (last_pool != pwq->pool) { // true
if (.. && worker->current_pwq->wq == wq) {
// false, since @worker is a
// a worker of @last_pool (for
// WQ1), and @wq is WQ2.
}
...
insert_work(pwq, W, ...);
}
// W queued.
<schedule to worker on CPU 1>
process_one_work():
collision = find_worker_executing_work(..);
// NULL, because we're searching the
// worker pool of CPU 1, while W is
// the current work on worker pool of
// CPU 0.
worker->current_work = W;
worker->current_func = W->func;
worker->current_func(...);
...
worker->current_func(...); // Re-entrance

This issue is already partially fixed because in queue_work_on(),
last_pool can be used to queue the work, as a result the requeued work
processing will find the collision and wait for the existing one to
finish. However, currently the last_pool is only used when two
workqueues are the same one, which causes the issue. Therefore extend
the behavior to allow last_pool to requeue the work W even if the
workqueues are different. It's safe to do this since the work W has been
proved safe to queue and run on the last_pool.

Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1418710bffcd..410141cc5f88 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1465,7 +1465,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,

worker = find_worker_executing_work(last_pool, work);

- if (worker && worker->current_pwq->wq == wq) {
+ if (worker) {
pwq = worker->current_pwq;
} else {
/* meh... not running there, queue here */
--
2.32.0