[PATCH] io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
From: Runyu Xiao
Date: Wed May 27 2026 - 10:47:56 EST
Commit bdf0bf73006e ("io_uring/io-wq: check IO_WQ_BIT_EXIT inside work
run loop") fixed the obvious case where io_worker_handle_work() took one
exit-bit snapshot before draining pending work, but the fix stops one
level too early.
io_worker_handle_work() now re-checks IO_WQ_BIT_EXIT in its outer work
run loop, yet it still snapshots that bit once before processing a
whole dependent linked-work chain. If io_wq_exit_start() sets
IO_WQ_BIT_EXIT after the first linked item has started, the remaining
linked items can still reuse stale do_kill = false, skip
IO_WQ_WORK_CANCEL, and continue running after exit has begun.
That means the previous fix did not fully eliminate the exit-latency
problem; it only narrowed it to linked chains. A long or slow linked
chain can still keep io-wq exit waiting for work that should already
have been canceled.
The issue was found on Linux v6.18.21 by our static-analysis tool,
which flagged linked-work loops that snapshot shared exit state
outside per-item cancel decisions, and was then confirmed by manual
auditing of io_worker_handle_work(). It was later reproduced with a
QEMU no-device validation selftest that preserved the same contract:
a three-node unbound linked chain, an exit actor setting
IO_WQ_BIT_EXIT after work1, and slow post-exit linked work. With a
3000 ms delay injected into each post-exit item, the buggy path
spends about 6066 ms after exit running work2/work3, while the fixed
path cancels both and finishes in about 2 ms.
Re-check test_bit(IO_WQ_BIT_EXIT, &wq->state) for each iteration of the
dependent-link loop, right before deciding whether to cancel the
current work item. That closes the remaining stale-snapshot window and
prevents linked post-exit work from stretching shutdown latency.
Build-tested by compiling io_uring/io-wq.o on x86_64 with the local
.config. No special hardware was required.
Fixes: bdf0bf73006e ("io_uring/io-wq: check IO_WQ_BIT_EXIT inside work run loop")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Runyu Xiao <runyu.xiao@xxxxxxxxxx>
---
io_uring/io-wq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 49a9c914b4e9..28d81398ebee 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -601,7 +601,6 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
struct io_wq *wq = worker->wq;
do {
- bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
struct io_wq_work *work;
/*
@@ -637,6 +636,7 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
/* handle a whole dependent link */
do {
+ bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
struct io_wq_work *next_hashed, *linked;
unsigned int work_flags = atomic_read(&work->flags);
unsigned int hash = __io_wq_is_hashed(work_flags)
--
2.34.1