[PATCH AUTOSEL 6.19-5.10] nvme-pci: ensure we're polling a polled queue

From: Sasha Levin

Date: Tue Mar 10 2026 - 05:06:23 EST

From: Keith Busch <kbusch@xxxxxxxxxx>

[ Upstream commit 166e31d7dbf6aa44829b98aa446bda5c9580f12a ]

A user can change the polled queue count at run time. There's a brief
window during a reset where a hipri task may try to poll that queue
before the block layer has updated the queue maps, which would race with
the now interrupt driven queue and may cause double completions.

Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

Now I have all the information needed for my analysis.

## Analysis

### 1. Commit Message Analysis

The commit message clearly describes a **race condition** that can cause
**double completions**:
- A user can change the polled queue count at runtime
- During a controller reset, there's a window where a hipri task may
poll a queue that has transitioned from polled to interrupt-driven
- Both the polling path and the interrupt handler could complete the
same I/O, causing double completions

Double completions in the block layer are a serious bug — they can lead
to use-after-free (the request is freed after the first completion, then
accessed again during the second completion), data corruption, or kernel
crashes.

### 2. Code Change Analysis

The fix is minimal — a single additional check in `nvme_poll()`:

```c
- if (!nvme_cqe_pending(nvmeq))
+ if (!test_bit(NVMEQ_POLLED, &nvmeq->flags) ||
+ !nvme_cqe_pending(nvmeq))
return 0;
```

Before polling the completion queue, verify the queue is actually still
a polled queue via `test_bit(NVMEQ_POLLED, ...)`. If the queue has
transitioned to interrupt-driven mode (the NVMEQ_POLLED bit was cleared
in `nvme_suspend_queue()`), skip the poll and return 0.

This is a 2-line change in a single file. The risk of regression is
extremely low.

### 3. Classification

- **Race condition fix** — prevents concurrent polling and interrupt-
driven completion of the same queue
- **Prevents double completions** — which are a serious kernel bug
(potential UAF, crashes)
- Authored by Keith Busch (NVMe maintainer), reviewed by Christoph
Hellwig and Kanchan Joshi

### 4. Scope and Risk

- **1 file changed, 2 insertions, 1 deletion** — extremely small and
contained
- The added check is a fast bit test that simply returns early if the
queue isn't polled
- No functional changes to the normal code path — polled queues still
poll as before
- Only affects the edge case of runtime queue configuration changes

### 5. Dependencies

The `NVMEQ_POLLED` flag has existed since kernel v5.2 (commit
7c349dde26b7, March 2019). The `test_bit(NVMEQ_POLLED, ...)` pattern is
already used elsewhere in the same file (e.g., `nvme_poll_irqdisable()`,
`nvme_suspend_queue()`). This fix is fully self-contained and applies
cleanly without any prerequisites.

The commit appears alongside two related commits (`44b798a6fd680` "cap
queue creation" and `02bfc966463a7` "do not try to add queue maps at
runtime"), but this fix is independent — it guards against an already-
existing race in `nvme_poll()` regardless of the other changes.

### 6. User Impact

NVMe is one of the most widely used storage interfaces in modern
systems. Any system using polled I/O queues (common for high-performance
storage workloads) and changing queue configuration at runtime could hit
this race. Double completions can cause kernel panics, data corruption,
or use-after-free.

### Verification

- Verified upstream commit is `166e31d7dbf6` which appeared in
`v7.0-rc3`
- Confirmed `NVMEQ_POLLED` flag was introduced in commit `7c349dde26b7`
(v5.2), so it exists in all current stable trees
- Confirmed the fix is a 2-line change modifying only
`drivers/nvme/host/pci.c`
- Confirmed the code pattern (`test_bit(NVMEQ_POLLED, ...)`) is already
used elsewhere in the same file, particularly in
`nvme_poll_irqdisable()` at line 1631
- Verified the commit is reviewed by Christoph Hellwig and Kanchan Joshi
- Confirmed the fix is self-contained — no dependency on the two
companion commits

### Conclusion

This is a textbook stable backport candidate: a tiny, surgical fix for a
race condition that can cause double completions (potential UAF/crash),
in a critical subsystem (NVMe storage), authored and reviewed by
subsystem maintainers, with zero risk of regression.

**YES**

drivers/nvme/host/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 74aca8846fffe..3f4af341474e1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1499,7 +1499,8 @@ static int nvme_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
struct nvme_queue *nvmeq = hctx->driver_data;
bool found;

- if (!nvme_cqe_pending(nvmeq))
+ if (!test_bit(NVMEQ_POLLED, &nvmeq->flags) ||
+ !nvme_cqe_pending(nvmeq))
return 0;

spin_lock(&nvmeq->cq_poll_lock);
--
2.51.0