Re: [PATCH v5] nvme: reject passthrough of driver-managed Set Features

From: Christoph Hellwig

Date: Wed May 27 2026 - 10:22:03 EST


On Sat, May 23, 2026 at 06:56:29PM -0400, Chao Shi wrote:
> Since commit b58da2d270db ("nvme: update keep alive interval when kato
> is modified"), userspace can start keep-alive on any transport via a
> Set Features (KATO) passthrough command. nvme_keep_alive_work() then
> allocates with BLK_MQ_REQ_RESERVED, but nvme_alloc_admin_tag_set()
> only reserves admin tags for fabrics, so the allocation trips
> WARN_ON_ONCE() in blk_mq_get_tag() and fails:
>
> nvme nvme0: keep-alive failed: -11
>
> More generally, several Set Features change controller state that the
> driver manages itself and cannot react to correctly when set behind
> its back from userspace. Reject these in nvme_cmd_allowed():
>
> - KATO on non-fabrics (keep-alive is only armed for fabrics; on PCIe
> it has no reserved tag and an active keep-alive harms idle power
> states)
> - Host Behavior Support, Host Memory Buffer, Number of Queues, and
> Autonomous Power State Transition (all driver-managed)
>
> Keep Alive on fabrics is unchanged. I/O commands are unaffected as the
> check is confined to the admin path (ns == NULL).
>
> Link: https://lore.kernel.org/linux-nvme/20260522162639.395802-1-coshi036@xxxxxxxxx/
>
> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@xxxxxxxxxxxx>
> Acked-by: Dave Tian <daveti@xxxxxxxxxx>
> Acked-by: Weidong Zhu <weizhu@xxxxxxx>
> Signed-off-by: Chao Shi <coshi036@xxxxxxxxx>
> ---
>
> Reproducer for the keep-alive case (run as root on a PCIe NVMe device):
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <linux/nvme_ioctl.h>
>
> int main(void)
> {
> struct nvme_admin_cmd cmd = {0};
> int fd = open("/dev/nvme0", O_RDWR);
> if (fd < 0) { perror("open"); return 1; }
> cmd.opcode = 0x09; /* SET_FEATURES */
> cmd.cdw10 = 0x0f; /* Feature ID: KATO */
> cmd.cdw11 = 5; /* KATO = 5 seconds */
> if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
> perror("ioctl");
> return 1;
> }
> return 0;
> }
>
> On an unpatched kernel, within ~kato/2 seconds after the program exits,
> dmesg shows:
>
> nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
> WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
> nvme nvme0: keep-alive failed: -11
>
> With this patch the ioctl fails with EACCES on non-fabrics.
>
> Changes since v4:
> - Fold the check into the existing nvme_cmd_allowed() instead of a
> separate helper, and reject additional driver-managed Set Features
> (Host Behavior, Host Memory Buffer, Number of Queues, Autonomous
> Power State Transition) in the same switch (Keith Busch). The admin
> vs I/O distinction is now structural: the switch lives in the
> ns == NULL branch, so I/O commands (e.g. Dataset Management, which
> shares opcode 0x09 with Set Features) are never inspected.
>
> Changes since v3:
> - Only inspect admin commands so a DSM I/O command is not wrongly
> rejected (Keith Busch).
>
> Changes since v2:
> - Reject the KATO passthrough on non-fabrics instead of reserving an
> admin tag for all transports (Keith Busch, Christoph Hellwig).
>
> Changes since v1:
> - v2 added a spec citation and quirk discussion, superseded by the
> reject approach.
>
> drivers/nvme/host/ioctl.c | 33 +++++++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index a9c097dacad6..31784506e845 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -14,8 +14,9 @@ enum {
> NVME_IOCTL_PARTITION = (1 << 1),
> };
>
> -static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
> - unsigned int flags, bool open_for_write)
> +static bool nvme_cmd_allowed(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
> + struct nvme_command *c, unsigned int flags,
> + bool open_for_write)
> {
> u32 effects;
>
> @@ -50,6 +51,26 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
> case NVME_ID_CNS_CTRL:
> return true;
> }
> + } else if (c->common.opcode == nvme_admin_set_features) {
> + /*
> + * Reject Set Features that change controller state the
> + * driver manages itself; setting them behind the driver's
> + * back from userspace leaves it unable to react correctly.

Overly long lines. I suspect we're best off splitting out the
admin and ns-command set specific parts of nvme_cmd_allowed into
separate helpers. And maybe use a switch statement on the command
as nested ifs become cumersome in the long run.

> - if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))
> + if (!nvme_cmd_allowed(ctrl, ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))

Another overly long line here.

Otherwise this looks good.