RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

From: Parav Pandit
Date: Tue Aug 20 2019 - 04:58:15 EST


+ Dave.

Hi Jiri, Dave, Alex, Kirti, Cornelia,

Please provide your feedback on it, how shall we proceed?

Short summary of requirements.
For a given mdev (mediated device [1]), there is one representor netdevice and devlink port in switchdev mode (similar to SR-IOV VF),
And there is one netdevice for the actual mdev when mdev is probed.

(a) representor netdev and devlink port should be able derive phys_port_name().
So that representor netdev name can be built deterministically across reboots.

(b) for mdev's netdevice, mdev's device should have an attribute.
This attribute can be used by udev rules/systemd or something else to rename netdev name deterministically.

(c) IFNAMSIZ of 16 bytes is too small to fit whole UUID.
A simple grep IFNAMSIZ in stack hints hundreds of users of IFNAMSIZ in drivers, uapi, netlink, boot config area and more.
Changing IFNAMSIZ for a mdev bus doesn't really look reasonable option to me.

Hence, I would like to discuss below options.

Option-1: mdev index
Introduce an optional mdev index/handle as u32 during mdev create time.
User passes mdev index/handle as input.

phys_port_name=mIndex=m%u
mdev_index will be available in sysfs as mdev attribute for udev to name the mdev's netdev.

example mdev create command:
UUID=$(uuidgen)
echo $UUID index=10 > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
example netdevs:
repnetdev=ens2f0_m10 /*ens2f0 is parent PF's netdevice */
mdev_netdev=enm10

Pros:
1. mdevctl and any other existing tools are unaffected.
2. netdev stack, ovs and other switching platforms are unaffected.
3. achieves unique phys_port_name for representor netdev
4. achieves unique mdev eth netdev name for the mdev using udev/systemd extension.
5. Aligns well with mdev and netdev subsystem and similar to existing sriov bdf's.

Option-2: shorter mdev name
Extend mdev to have shorter mdev device name in addition to UUID.
such as 'foo', 'bar'.
Mdev will continue to have UUID.
phys_port_name=mdev_name

Pros:
1. All same as option-1, except mdevctl needs upgrade for newer usage.
It is common practice to upgrade iproute2 package along with the kernel.
Similar practice to be done with mdevctl.
2. Newer users of mdevctl who wants to work with non_UUID names, will use newer mdevctl/tools.
Cons:
1. Dual naming scheme of mdev might affect some of the existing tools.
It's unclear how/if it actually affects.
mdevctl [2] is very recently developed and can be enhanced for dual naming scheme.

Option-3: mdev uuid alias
Instead of shorter mdev name or mdev index, have alpha-numeric name alias.
Alias is an optional mdev sysfs attribute such as 'foo', 'bar'.
example mdev create command:
UUID=$(uuidgen)
echo $UUID alias=foo > /sys/class/net/ens2f0/mdev_supported_types/mlx5_core_mdev/create
example netdevs:
examle netdevs:
repnetdev = ens2f0_mfoo
mdev_netdev=enmfoo

Pros:
1. All same as option-1.
2. Doesn't affect existing mdev naming scheme.
Cons:
1. Index scheme of option-1 is better which can number large number of mdevs with fewer characters, simplifying the management tool.

Option-4: extend IFNAMESZ to be 64 bytes Extended IFNAMESZ from 16 to 64 bytes phys_port_name=mdev_UUID_string mdev_netdev_name=enmUUID

Pros:
1. Doesn't require mdev extension
Cons:
1. netdev stack, driver, uapi, user space, boot config wide changes
2. Possible user space extensions who assumed name size being 16 characters
3. Single device type demands namesize change for all netdev types

[1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
[2] https://github.com/mdevctl/mdevctl

Regards,
Parav Pandit

> -----Original Message-----
> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Parav Pandit
> Sent: Wednesday, August 14, 2019 9:51 PM
> To: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Cc: Cornelia Huck <cohuck@xxxxxxxxxx>; Kirti Wankhede
> <kwankhede@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; cjia@xxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>;
> netdev@xxxxxxxxxxxxxxx
> Subject: RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
>
>
>
> > -----Original Message-----
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Wednesday, August 14, 2019 8:28 PM
> > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > Cc: Cornelia Huck <cohuck@xxxxxxxxxx>; Kirti Wankhede
> > <kwankhede@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-
> > kernel@xxxxxxxxxxxxxxx; cjia@xxxxxxxxxx; Jiri Pirko
> > <jiri@xxxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> >
> > On Wed, 14 Aug 2019 13:45:49 +0000
> > Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> >
> > > > -----Original Message-----
> > > > From: Cornelia Huck <cohuck@xxxxxxxxxx>
> > > > Sent: Wednesday, August 14, 2019 6:39 PM
> > > > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>; Kirti Wankhede
> > > > <kwankhede@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-
> > > > kernel@xxxxxxxxxxxxxxx; cjia@xxxxxxxxxx; Jiri Pirko
> > > > <jiri@xxxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Wed, 14 Aug 2019 12:27:01 +0000 Parav Pandit
> > > > <parav@xxxxxxxxxxxx> wrote:
> > > >
> > > > > + Jiri, + netdev
> > > > > To get perspective on the ndo->phys_port_name for the
> > > > > representor netdev
> > > > of mdev.
> > > > >
> > > > > Hi Cornelia,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Cornelia Huck <cohuck@xxxxxxxxxx>
> > > > > > Sent: Wednesday, August 14, 2019 1:32 PM
> > > > > > To: Parav Pandit <parav@xxxxxxxxxxxx>
> > > > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>; Kirti
> > > > > > Wankhede <kwankhede@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-
> > > > > > kernel@xxxxxxxxxxxxxxx; cjia@xxxxxxxxxx
> > > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > > >
> > > > > > On Wed, 14 Aug 2019 05:54:36 +0000 Parav Pandit
> > > > > > <parav@xxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > > > I get that part. I prefer to remove the UUID itself from
> > > > > > > > > the structure and therefore removing this API makes lot
> > > > > > > > > more
> > sense?
> > > > > > > >
> > > > > > > > Mdev and support tools around mdev are based on UUIDs
> > > > > > > > because it's
> > > > > > defined
> > > > > > > > in the documentation.
> > > > > > > When we introduce newer device naming scheme, it will update
> > > > > > > the
> > > > > > documentation also.
> > > > > > > May be that is the time to move to .rst format too.
> > > > > >
> > > > > > You are aware that there are existing tools that expect a uuid
> > > > > > naming scheme, right?
> > > > > >
> > > > > Yes, Alex mentioned too.
> > > > > The good tool that I am aware of is [1], which is 4 months old.
> > > > > Not sure if it is
> > > > part of any distros yet.
> > > > >
> > > > > README also says, that it is in 'early in development. So we
> > > > > have scope to
> > > > improve it for non UUID names, but lets discuss that more below.
> > > >
> > > > The up-to-date reference for mdevctl is
> > > > https://github.com/mdevctl/mdevctl. There is currently an effort
> > > > to get this packaged in Fedora.
> > > >
> > > Awesome.
> > >
> > > > >
> > > > > > >
> > > > > > > > I don't think it's as simple as saying "voila, UUID
> > > > > > > > dependencies are removed, users are free to use arbitrary
> > > > > > > > strings". We'd need to create some kind of naming policy,
> > > > > > > > what characters are allows so that we can potentially
> > > > > > > > expand the creation parameters as has been proposed a
> > > > > > > > couple times, how do we deal with collisions and races,
> > > > > > > > and why should we make such a change when a UUID is a
> > > > > > > > perfectly reasonable devices name. Thanks,
> > > > > > > >
> > > > > > > Sure, we should define a policy on device naming to be more
> relaxed.
> > > > > > > We have enough examples in-kernel.
> > > > > > > Few that I am aware of are netdev (vxlan, macvlan, ipvlan,
> > > > > > > lot more), rdma
> > > > > > etc which has arbitrary device names and ID based device names.
> > > > > > >
> > > > > > > Collisions and race is already taken care today in the mdev core.
> > > > > > > Same
> > > > > > unique device names continue.
> > > > > >
> > > > > > I'm still completely missing a rationale _why_ uuids are
> > > > > > supposedly bad/restricting/etc.
> > > > > There is nothing bad about uuid based naming.
> > > > > Its just too long name to derive phys_port_name of a netdev.
> > > > > In details below.
> > > > >
> > > > > For a given mdev of networking type, we would like to have
> > > > > (a) representor netdevice [2]
> > > > > (b) associated devlink port [3]
> > > > >
> > > > > Currently these representor netdevice exist only for the PCIe SR-IOV
> VFs.
> > > > > It is further getting extended for mdev without SR-IOV.
> > > > >
> > > > > Each of the devlink port is attached to representor netdevice [4].
> > > > >
> > > > > This netdevice phys_port_name should be a unique derived from
> > > > > some
> > > > property of mdev.
> > > > > Udev/systemd uses phys_port_name to derive unique representor
> > > > > netdev
> > > > name.
> > > > > This netdev name is further use by orchestration and switching
> > > > > software in
> > > > user space.
> > > > > One such distro supported switching software is ovs [4], which
> > > > > relies on the
> > > > persistent device name of the representor netdevice.
> > > >
> > > > Ok, let me rephrase this to check that I understand this correctly.
> > > > I'm not sure about some of the terms you use here (even after
> > > > looking at the linked doc/code), but that's probably still ok.
> > > >
> > > > We want to derive an unique (and probably persistent?) netdev name
> > > > so that userspace can refer to a representor netdevice. Makes sense.
> > > > For generating that name, udev uses the phys_port_name (which
> > > > represents the devlink port, IIUC). Also makes sense.
> > > >
> > > You understood it correctly.
> > >
> > > > >
> > > > > phys_port_name has limitation to be only 15 characters long.
> > > > > UUID doesn't fit in phys_port_name.
> > > >
> > > > Understood. But why do we need to derive the phys_port_name from
> > > > the mdev device name? This netdevice use case seems to be just one
> > > > use case for using mdev devices? If this is a specialized mdev
> > > > type for this setup, why not just expose a shorter identifier via an extra
> attribute?
> > > >
> > > Representor netdev, represents mdev's switch port (like PCI SRIOV
> > > VF's switch
> > port).
> > > So user must be able to relate this two objects in similar manner as
> > > SRIOV
> > VFs.
> > > Phys_port_name is derived from the PCI PF and VF numbering scheme.
> > > Similarly mdev's such port should be derived from mdev's
> id/name/attribute.
> > >
> > > > > Longer UUID names are creating snow ball effect, not just in
> > > > > networking stack
> > > > but many user space tools too.
> > > >
> > > > This snowball effect mainly comes from the device name ->
> > > > phys_port_name setup, IIUC.
> > > >
> > > Right.
> > >
> > > > > (as opposed to recently introduced mdevctl, are they more mdev
> > > > > tools which has dependency on UUID name?)
> > > >
> > > > I am aware that people have written scripts etc. to manage their mdevs.
> > > > Given that the mdev infrastructure has been around for quite some
> > > > time, I'd say the chance of some of those scripts relying on uuid
> > > > names is
> > non-zero.
> > > >
> > > Ok. but those scripts have never managed networking devices.
> > > So those scripts won't break because they will always create mdev
> > > devices
> > using UUID.
> > > When they use these new networking devices, they need more things
> > > than
> > their scripts.
> > > So user space upgrade for such mixed mode case is reasonable.
> >
> > Tools like mdevctl are agnostic of the type of mdev device they're
> > managing, it shouldn't matter than they've never managed a networking
> > mdev previously, it follows the standards of mdev management.
> >
> > > > >
> > > > > Instead of mdev subsystem creating such effect, one option we
> > > > > are
> > > > considering is to have shorter mdev names.
> > > > > (Similar to netdev, rdma, nvme devices).
> > > > > Such as mdev1, mdev2000 etc.
> >
> > Note that these are kernel generated names, as are the other examples.
> No. I probably gave the wrong examples.
> Mdev user provided names can be 'foo', 'bar', 'foo1'.
>
> > In the case of mdev, the user is providing the UUID, which becomes the
> > device name. When a user writes to the create attribute, there needs
> > to be determinism that the user can identify the device they created
> > vs another that may have been created concurrently. I don't see that
> > we can put users in the path of managing device instance numbers.
> No. Its just user provided names.
>
> >
> > > > > Second option I was considering is to have an optional alias for
> > > > > UUID based
> > > > mdev.
> > > > > This name alias is given at time of mdev creation.
> > > > > Devlink port's phys_port_name is derived out of this shorter
> > > > > mdev name
> > > > alias.
> > > > > This way, mdev remains to be UUID based with optional extension.
> > > > > However, I prefer first option to relax mdev naming scheme.
> > > >
> > > > Actually, I think that second option makes much more sense, as you
> > > > avoid potentially breaking existing tooling.
> > > Let's first understand of what exactly will break with existing tool
> > > if they see non_uuid based device.
> >
> > Do we really want a mixed namespace of device names, some UUID, some...
> > something else? That seems like a mess.
> >
> So you prefer alias as an attribute? If so, it should be an optional additional
> parameter during create time, because it is desired to not invent new callbacks
> for such attributes setting and (and rewrite them).
>
> > > Existing tooling continue to work with UUID devices.
> > > Do you have example of what can break if they see non_uuid based
> > > device name? I think you are clear, but to be sure, UUID based
> > > creation will continue to be there. Optionally mdev will be created
> > > with alpha-numeric string, if we don't it as additional attribute.
> >
> > I'm not onboard with a UUID being just one of the possible naming
> > strings via which we can create mdev devices. I think that becomes
> > untenable for userspace. I don't think a sufficient argument has been
> > made against the alias approach, which seems to keep the UUID as a
> > canonical name, providing a consistent namespace, augmented with user or
> kernel provided short alias.
> > Thanks,
> >
> If I understand you correctly, you prefer alias name approach to keep UUID
> naming scheme intact in mdev?
>
> > Alex
> >
> > > > >
> > > > > > We want to uniquely identify a device, across different types
> > > > > > of vendor drivers. An uuid is a unique identifier and even a
> > > > > > well-defined one. Tools (e.g. mdevctl) are relying on it for
> > > > > > mdev devices
> > > > today.
> > > > > >
> > > > > > What is the problem you're trying to solve?
> > > > > Unique device naming is still achieved without UUID scheme by
> > > > > various
> > > > subsystems in kernel using alpha-numeric string.
> > > > > Having such string based continue to provide unique names.
> > > > >
> > > > > I hope I described the problem and two solutions above.
> > > > >
> > > > > [1] https://github.com/awilliam/mdevctl
> > > > > [2]
> > > > > https://elixir.bootlin.com/linux/v5.3-rc4/source/drivers/net/eth
> > > > > er
> > > > > net/
> > > > > mellanox/mlx5/core/en_rep.c [3]
> > > > > http://man7.org/linux/man-pages/man8/devlink-port.8.html
> > > > > [4]
> > > > > https://elixir.bootlin.com/linux/v5.3-rc4/source/net/core/devlink.
> > > > > c#L6
> > > > > 921
> > > > > [5] https://www.openvswitch.org/
> > > > >
> > >