Re: [PATCH 0/3] Provide more fine grained control over multipathing

From: Sagi Grimberg
Date: Thu May 31 2018 - 04:37:32 EST



Wouldn't expect you guys to nurture this 'mpath_personality' knob. SO
when features like "dispersed namespaces" land a negative check would
need to be added in the code to prevent switching from "native".

And once something like "dispersed namespaces" lands we'd then have to
see about a more sophisticated switch that operates at a different
granularity. Could also be that switching one subsystem that is part of
"dispersed namespaces" would then cascade to all other associated
subsystems? Not that dissimilar from the 3rd patch in this series that
allows a 'device' switch to be done in terms of the subsystem.

Which I think is broken by allowing to change this personality on the
fly.


Anyway, I don't know the end from the beginning on something you just
told me about ;) But we're all in this together. And can take it as it
comes.

I agree but this will be exposed to user-space and we will need to live
with it for a long long time...

I'm merely trying to bridge the gap from old dm-multipath while
native NVMe multipath gets its legs.

In time I really do have aspirations to contribute more to NVMe
multipathing. I think Christoph's NVMe multipath implementation of
bio-based device ontop on NVMe core's blk-mq device(s) is very clever
and effective (blk_steal_bios() hack and all).

That's great.

Don't get me wrong, I do support your cause, and I think nvme should try
to help, I just think that subsystem granularity is not the correct
approach going forward.

I understand there will be limits to this 'mpath_personality' knob's
utility and it'll need to evolve over time. But the burden of making
more advanced NVMe multipath features accessible outside of native NVMe
isn't intended to be on any of the NVMe maintainers (other than maybe
remembering to disallow the switch where it makes sense in the future).

I would expect that any "advanced multipath features" would be properly
brought up with the NVMe TWG as a ratified standard and find its way
to nvme. So I don't think this particularly is a valid argument.

As I said, I've been off the grid, can you remind me why global knob is
not sufficient?

Because once nvme_core.multipath=N is set: native NVMe multipath is then
not accessible from the same host. The goal of this patchset is to give
users choice. But not limit them to _only_ using dm-multipath if they
just have some legacy needs.

Tough to be convincing with hypotheticals but I could imagine a very
obvious usecase for native NVMe multipathing be PCI-based embedded NVMe
"fabrics" (especially if/when the numa-based path selector lands). But
the same host with PCI NVMe could be connected to a FC network that has
historically always been managed via dm-multipath.. but say that
FC-based infrastructure gets updated to use NVMe (to leverage a wider
NVMe investment, whatever?) -- but maybe admins would still prefer to
use dm-multipath for the NVMe over FC.

You are referring to an array exposing media via nvmf and scsi
simultaneously? I'm not sure that there is a clean definition of
how that is supposed to work (ANA/ALUA, reservations, etc..)

This might sound stupid to you, but can't users that desperately must
keep using dm-multipath (for its mature toolset or what-not) just
stack it on multipath nvme device? (I might be completely off on
this so feel free to correct my ignorance).

We could certainly pursue adding multipath-tools support for native NVMe
multipathing. Not opposed to it (even if just reporting topology and
state). But given the extensive lengths NVMe multipath goes to hide
devices we'd need some way to piercing through the opaque nvme device
that native NVMe multipath exposes. But that really is a tangent
relative to this patchset. Since that kind of visibility would also
benefit the nvme cli... otherwise how are users to even be able to trust
but verify native NVMe multipathing did what it expected it to?

Can you explain what is missing for multipath-tools to resolve topology?

nvme list-subsys is doing just that, doesn't it? It lists subsys-ctrl
topology but that is sort of the important information as controllers
are the real paths.