Re: [BUG] RCU hang with io_uring nvme polling

From: Jens Axboe

Date: Fri Jun 26 2026 - 11:18:19 EST

On 6/26/26 9:09 AM, Ben Carey wrote:
> From: benjamin.james.carey3@xxxxxxxxx
>
> Hello, whomever this may concern.
>
> I am working in a lab researching energy efficiency of I/O servicing and
> completion mechanisms, and we have encountered an issue when using io_uring and
> completing I/O requests while polling NVMe drives.
>
> Description
> ===========
>
> When using fio to run io_uring test benches for energy consumption analysis
> on our lab server, we're encountering strange kernel locking behaviors as
> numjobs increases.
>
> This issue occurs on our workloads the poll for I/O completion. Specifically,
> whenever the numjobs parameter scales to beyond the nvme.poll_queues
> parameter, the job takes much longer to complete or doesn't complete at all.
>
> Notably, this issue occurs also on a QEMU image mimicking our setup. Using GDB
> to read dmesg output we get the following:
>
> ...
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P1070
> rcu: (detected by 7, t=252035 jiffies, g=1985, q=25149 ncpus=8)
> task:fio state:R running task stack:13296 pid:1070 tgid:1070 ppid:1068 task_flags:0x400140 flags:0x00080000
> Call Trace:
> ...
> ? blk_hctx_poll+0x34/0x80
> blk_mq_poll+0x2b/0x40
> bio_poll+0x94/0x180
> iocb_bio_iopoll+0x31/0x50
> io_uring_classic_poll+0x20/0x40
> io_do_iopoll+0x233/0x430
> ? io_issue_sqe+0x2f/0x560
> ? io_submit_sqes+0x270/0x820
> __do_sys_io_uring_enter+0x228/0x770
> ? handle_softirqs+0xc7/0x250
> __x64_sys_io_uring_enter+0x21/0x30
> x64_sys_call+0x17c8/0x1dd0
> do_syscall_64+0xe0/0x5a0
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Expected behavior
> =================
>
> fio job completes after specified runtime.
>
> Actual behavior
> ===============
>
> fio job never completes, system becomes less responsive (if the number of poll
> queues and jobs are high) and RCU stall checker detects stalls.
>
> Observations
> ============
>
> After some minimal investigation we found this notable function being called as
> the callback for q->mq_ops->poll:
>
> static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
> {
> struct nvme_queue *nvmeq = hctx->driver_data;
> bool found;
>
> if (!test_bit(NVMEQ_POLLED, &nvmeq->flags) ||
> !nvme_cqe_pending(nvmeq))
> return 0;
>
> spin_lock(&nvmeq->cq_poll_lock);
> found = nvme_poll_cq(nvmeq, iob);
> spin_unlock(&nvmeq->cq_poll_lock);
>
> return found;
> }
>
> This function, when stuck on the RCU loop, always returns 0. It also always
> calls the helper function nvme_cqe_pending.
>
> Following this are some items that may help in reproducing this issue.
>
> Steps to reproduce
> ==================
> From a running QEMU image with the latest kernel:
> 1. Attach GDB to the running instance.
> 2. Enable io polling via sysfs (echo 1 > /sys/block/nvme0n1/queue/io_poll).

That's not how that works at all. You need to setup poll queues on the
nvme driver side, using the nvme.poll_queues=XX kernel parameter, or if
using nvme as a module, load the module with poll_queues=XX where XX is
the number of poll queues. You're not doing any polled IO as-is, and the
above should also have dumped a dmesg message about how that does
absolutely nothing.

That said, it should still work, just not doing polled IO. I'll take a
look sometime next week, OOO right now.

--
Jens Axboe