Re: [RFC 2/2] workqueue: Fix work re-entrance when requeue to a different workqueue

From: Lai Jiangshan
Date: Fri Oct 08 2021 - 22:06:39 EST


On Fri, Oct 8, 2021 at 6:06 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
>
> When requeuing a work to a different workqueue while it's still getting
> processed, re-entrace as the follow can happen:
>
> { both WQ1 and WQ2 are bounded workqueue, and a work W has been
> queued on CPU0 for WQ1}
>
> CPU 0 CPU 1
> ===== ====
> <In worker on CPU 0>
> process_one_work():
> ...
> // pick up W
> worker->current_work = W;
> worker->current_func = W->func;
> ...
> set_work_pool_and_clear_pending(...);
> // W can be requeued afterwards
> queue_work_on(1, WQ2, W):
> if (!test_and_set_bit(...)) {
> // this branch is taken, as CPU 0
> // just clears pending bit.
> __queue_work(...):
> pwq = <pool for CPU1 of WQ2>;
> last_pool = <pool for CPU 0 of WQ1>;
> if (last_pool != pwq->pool) { // true
> if (.. && worker->current_pwq->wq == wq) {
> // false, since @worker is a
> // a worker of @last_pool (for
> // WQ1), and @wq is WQ2.
> }
> ...
> insert_work(pwq, W, ...);
> }
> // W queued.
> <schedule to worker on CPU 1>
> process_one_work():
> collision = find_worker_executing_work(..);
> // NULL, because we're searching the
> // worker pool of CPU 1, while W is
> // the current work on worker pool of
> // CPU 0.
> worker->current_work = W;
> worker->current_func = W->func;
> worker->current_func(...);
> ...
> worker->current_func(...); // Re-entrance

Concurrent or parallel executions on the same work item aren't
considered as "Re-entrance" if the workqueue is changed.

It allows the work function to free itself(the item) and another
subsystem allocates the same item and reuses it.

"Re-entrance" is defined as:
work function has not been changed
wq has not been changed
the item has not been reinitiated.
(To reduce the check complication, the workqueue subsystem often
considers it "Re-entrance" if the condition is changed and has changed
back. But the wq users should not depend on this behavior and should avoid
it)


>
> This issue is already partially fixed because in queue_work_on(),
> last_pool can be used to queue the work, as a result the requeued work
> processing will find the collision and wait for the existing one to
> finish. However, currently the last_pool is only used when two
> workqueues are the same one, which causes the issue. Therefore extend
> the behavior to allow last_pool to requeue the work W even if the
> workqueues are different. It's safe to do this since the work W has been
> proved safe to queue and run on the last_pool.
>
> Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> ---
> kernel/workqueue.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1418710bffcd..410141cc5f88 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1465,7 +1465,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
>
> worker = find_worker_executing_work(last_pool, work);
>
> - if (worker && worker->current_pwq->wq == wq) {
> + if (worker) {
> pwq = worker->current_pwq;
> } else {
> /* meh... not running there, queue here */
> --
> 2.32.0
>