Re: [PATCH] nvme: reject completions for requests that are not in flight
From: Chao S
Date: Mon May 25 2026 - 16:27:49 EST
Hi,
Since posting this I reproduced a more severe manifestation of the same
bug and confirmed the patch handles it; sharing as extra justification.
The commit message covers the freed / never-dispatched case (the NULL
rq->mq_hctx dereference). When the stale command id instead maps to a
tag that has already been *reused*, the driver completes an unrelated,
still-in-flight request -- a use-after-free. Under fuzzing (a device
that replays and reorders completions) this did not show up as a clean
NULL deref but as cross-subsystem memory corruption: general protection
faults in mtree_range_walk(), unlink_anon_vmas() and the slub freelist,
in unrelated tasks (modprobe, systemd-udevd, ...). The trigger was a
stale completion delivered for a request that a concurrent controller
reset had just freed.
To confirm the fix addresses this, I rebuilt the kernel with the patch
and re-ran the same workload for ~10h. The guard now rejects the
offending completion instead of acting on it:
nvme nvme0: resetting controller
nvme nvme0: completion for request 0x1c0 not in flight
nvme nvme0: invalid id 448 completed on queue 2
and no use-after-free / corruption recurred over the run.
The code is unchanged; I'm happy to fold this into the commit message
as a v2 if you'd prefer it spelled out there.
Thanks,
Chao
On Fri, May 22, 2026 at 11:30 AM Chao Shi <coshi036@xxxxxxxxx> wrote:
>
> nvme_find_rq() resolves a device-supplied command id to a request with
> blk_mq_tag_to_rq(), which returns whatever request last used that tag -
> possibly one that is no longer in flight (freed, or never dispatched and
> thus with a NULL rq->mq_hctx). Commit e7006de6c238 ("nvme: code
> command_id with a genctr for use-after-free validation") guards against
> this, but its generation counter is only 4 bits wide and can be matched
> by a malfunctioning or malicious device replaying command ids. The
> driver then completes a request that is not outstanding, dereferencing a
> NULL rq->mq_hctx or double-completing a command:
>
> Oops: general protection fault ... KASAN: null-ptr-deref
> RIP: blk_mq_complete_request_remote+0xe5/0xa80 block/blk-mq.c:1319
> nvme_handle_cqe drivers/nvme/host/pci.c:1418 [inline]
> nvme_poll_cq drivers/nvme/host/pci.c:1449
> nvme_irq drivers/nvme/host/pci.c:1463
>
> Require the request to be in flight before completing it. The check uses
> the request state, so it also covers controllers with
> NVME_QUIRK_SKIP_CID_GEN.
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@xxxxxxxxxxxx>
> Acked-by: Dave Tian <daveti@xxxxxxxxxx>
> Acked-by: Weidong Zhu <weizhu@xxxxxxx>
> Signed-off-by: Chao Shi <coshi036@xxxxxxxxx>
> ---
> drivers/nvme/host/nvme.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 9a5f28c5103c..3a525c1dc818 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -615,6 +615,17 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
> tag);
> return NULL;
> }
> + /*
> + * blk_mq_tag_to_rq() returns whatever request last used this tag, which
> + * may no longer be in flight if the device reports a bogus command id.
> + * Completing it would deref a NULL rq->mq_hctx or double-complete a
> + * command; the 4-bit genctr below only narrows the window.
> + */
> + if (unlikely(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)) {
> + dev_err(nvme_req(rq)->ctrl->device,
> + "completion for request %#x not in flight\n", tag);
> + return NULL;
> + }
> if (unlikely(nvme_genctr_mask(nvme_req(rq)->genctr) != genctr)) {
> dev_err(nvme_req(rq)->ctrl->device,
> "request %#x genctr mismatch (got %#x expected %#x)\n",
> --
> 2.43.0
>