Re: [RFC PATCH] nvme: allow NVME_IOCTL_IO_CMD on controller char dev even when multiple ns
From: Niklas Cassel
Date: Wed Mar 31 2021 - 10:18:53 EST
On Tue, Mar 30, 2021 at 08:30:22PM +0200, javier@xxxxxxxxxxx wrote:
> On 26.03.2021 20:59, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@xxxxxxx>
> >
> > Currently when doing NVME_IOCTL_IO_CMD on the controller character device,
> > the command is rejected if there is more than one namespace in the
> > ctrl->namespaces list.
> >
> > There is not really any reason for this restriction.
> > Instead, check the nsid value specified in the passthru command, and try
> > to find the matching namespace in ctrl->namespaces list.
> > If found, call nvme_user_cmd() on the namespace.
> > If not found, reject the command.
> >
> > While at it, remove the warning that says that NVME_IOCTL_IO_CMD is
> > deprecated on the controller character device.
> > There is no comment saying why it is deprecated.
> > It might be very unsafe to send a passthru command, but if that is
> > the issue with this IOCTL, then we should add a warning about that
> > instead.
> >
> > Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>
> I think the idea is OK, but I have 3 questions:
>
> 1. Wouldn't this break user-space when nsid is not specified?
Since this is an ioctl, the kernel will always read some value
from cmd.nsid, so I assume you mean when specifying cmd.nsid == 0.
I don't think we have anything to worry about because:
a)
Like Keith said in the other thread:
"There are no IO commands accepting a 0 NSID, so rejecting those from the
driver should be okay."
Currently, when sending a NVME_IOCTL_IO_CMD on the ctrl char dev with
cmd.nsid == 0, we will take the first namespace in the list, use the
request_queue of that namespace, and then send the command there.
Since there are no I/O commands that accept a NSID == 0, whatever you
specified in cmd.opcode, you should get an "Invalid Namespace or Format"
error back from the controller.
I don't think that there is any harm in adding a check (which is essentially
what this RFC does), that will reject the command before sending it down to
the controller.
(A potential improvement in the future, on top of this patch, is to allow
nsid.cmd == broadcast address, but that is out of scope for this patch.
And no, since the current behavior on master does reject any cmd when there
is more than one namespace attached to the controller (more than one ns in
ctrl->namespaces list), I wouldn't say that master handles this case.)
b)
If you use nvme-cli then all commands that calls nvme_submit_io_passthru()
already does:
if (!cfg.namespace_id) {
err = cfg.namespace_id = nvme_get_nsid(fd);
if (err < 0) {
perror("get-namespace-id");
goto close_fd;
}
}
So either if you do a:
nvme write-zeroes -s 0 -c 0 --namespace-id=0 /dev/nvme0
or if you completely omit --namespace-id:
nvme write-zeroes -s 0 -c 0 /dev/nvme0
nvme-cli will already reject it (since the controller char device does not
(and can not) implement NVME_IOCTL_ID).
Sure, nvme-cli is just a single user of this ioctl (there might be other
users), but it is probably the most common one.
If nvme-cli already rejects it in user space, and we concluded that the
controller will reject it, I think it should be safe to reject it also
at the kernel side.
Without this patch, we are already rejecting any command, to any nsid,
if the controller has more than one namespace attached, which I think
makes less sense.
> 2. What is the use case for this? As I understand it, this char device
> is primarily for admin commands sent to the controller. Do you see a
> use case for sending commands to the namespace using the controller
> char device?
I don't have any use case for this, more than allowing to specify whatever
nsid you want in the --namespace-id parameter for nvme, when opening
/dev/nvme0, even when the controller has more than one namespace.
Why allow NVME_IOCTL_IO_CMD on /dev/nvme0 when the controller has one
namespace attached, but not when it has more than one namespace attached?
Doesn't make sense to me.
> 3. Following up on the above, if the use-case is I/O, don't you think
> it is cleaner to use the ongoing per-namespace char device effort? We
> would very much like to get your input there and eventually send a
> series together. When this is merged, we could wire that logic to the
> controller char device if there is an use-case for it.
While I'm not against your per-namespace char device effort, I'm not sure
if clean is the word I would use to decribe creating a bunch of extra
character devices, that almost no one will use, in the root of /dev/ even
(which is what the current proposal suggests.)
Perhaps it would be better if you had to do a
echo $nsid > /sys/class/nvme/nvme0/export_unsupported_namespace
to actually create one of these suggested additional per namespace
character devices. But I do realize that things such as udev rules might
be harder to do, since not all devices would be there automatically.
However, I do agree about using the existing per namespace block devices:
e.g.:
nvme write-zeroes -s 0 -c 0 /dev/nvme0n1
is cleaner than
nvme write-zeroes -s 0 -c 0 --namespace-id=1 /dev/nvme0
And I also agree that if we manage to support IOCTLs on rejected
namespaces, then it would be nice if we could do something like:
nvme write-zeroes -s 0 -c 0 --namespace-id=4 /dev/nvme0
if NSID was a namespace that was rejected/unsupported by the kernel.
Kind regards,
Niklas