Re: [PATCH] nvme: remove multipath module parameter

From: John Meneghini
Date: Tue Mar 11 2025 - 23:48:05 EST


On 3/9/25 1:23 PM, Nilay Shroff wrote:
It honestly has potential to solve some real problems, like
re-enumeration triggered by a link reset on an in-use drive. You'd
currently need to close the old handle and open a new on, even though
it's the same device. It may not even be possible to do that if that
device contains your root partition, and then you can only power cycle.

The downside is we wouldn't get the short cut to blk_mq_submit_bio. We'd
instead stack that atop an indirect call, so it's not free.

Yes agreed however it seems advantages of using an indirect call outweighs
using the short cut to blk_mq_submit_bio. Moreover it seems the cost of
indirect call is trivial because we already cache the nexthop.

I integrated your proposed patch (with few trivial additional changes on top)
and I see that it's coming out nicely. I ran few tests and confirmed it's
working well. However, in the proposed patch we*always* delay (~10 sec) the
Have you tested this with a NVMe-oF controller... yet?

Where did the number 10 seconds come from?

removal of multipath head node. That means that even while removing the
nvme module (rmmod nvme) or if user delete/detache the namespace, we delay
the removal of head node but that may not be what we want. So I'd suggest
instead, delayed removal of multipath head not shall be configurable using a
sysfs attribute. With this attribute then we shall let user opt for pinning
the head node (with optional delayed time as well?). And it's only when user

So be aware the TP-4129 is adding a CQT parameter which does almost exactly this.

shows the intent to pin the node we should delay its removal. This is what
exactly (pinning of head node) Christoph's proposed patch implements. So I'd
suggest a bit of amalgamation of yours as well as Christoph patch to implement
this change.

Please cc: me on your patches Nilay, I'd like to test them with my NVMe-oF testbed.

/John