Re: [PATCH] nvme: bound the freeze drain in passthrough commands

From: Christoph Hellwig

Date: Wed May 27 2026 - 09:33:26 EST


On Wed, May 27, 2026 at 01:59:23AM -0400, Chao Shi wrote:
> nvme_passthru_start() drains in-flight I/O via the unbounded
> nvme_wait_freeze() before submitting a command with command-set
> effects (Format NVM, Sanitize, Namespace Management, vendor unique).
> If a completion is silently dropped or the device hangs, the calling
> task wedges with ctrl->scan_lock and ctrl->subsys->lock held, fanning
> out into hung-task reports on any concurrent open/close/passthru on
> the same controller:
>
> INFO: task syz-executor:NNNN blocked for more than 123 seconds.
> nvme_wait_freeze+0x82/0x100
> nvme_passthru_start drivers/nvme/host/core.c:1249 [inline]
> nvme_submit_user_cmd+0x1ee/0x3d0 drivers/nvme/host/ioctl.c:189
>
> The other freeze-drain sites (pci shutdown, tcp/rdma reset) already
> bound the wait with nvme_wait_freeze_timeout(NVME_IO_TIMEOUT). Apply
> it here too; on timeout, unwind the freeze and return -EBUSY (or
> NVME_SC_INTERNAL on the nvmet path) instead of submitting the command.
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

So not blocking forever sounds useful, but this might break existing
uses. I guess we could do it based on the O_NONBLOCK flag if people
really cared.

Note that the blocked message itself is not a problem, but around
this time we should have done a controller reset and fixed up the
issue. Does that not happen for your test case?