RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
From: Parav Pandit
Date: Wed Mar 06 2019 - 00:43:16 EST
Hi Kirti,
> -----Original Message-----
> From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> Sent: Tuesday, March 5, 2019 9:51 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
> <jakub.kicinski@xxxxxxxxxxxxx>
> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>
>
>
> On 3/6/2019 6:14 AM, Parav Pandit wrote:
> > Hi Greg, Kirti,
> >
> >> -----Original Message-----
> >> From: Parav Pandit
> >> Sent: Tuesday, March 5, 2019 5:45 PM
> >> To: Parav Pandit <parav@xxxxxxxxxxxx>; Kirti Wankhede
> >> <kwankhede@xxxxxxxxxx>; Jakub Kicinski
> <jakub.kicinski@xxxxxxxxxxxxx>
> >> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> >> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx;
> davem@xxxxxxxxxxxxx;
> >> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
> >> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >> extension
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
> >>> owner@xxxxxxxxxxxxxxx> On Behalf Of Parav Pandit
> >>> Sent: Tuesday, March 5, 2019 5:17 PM
> >>> To: Kirti Wankhede <kwankhede@xxxxxxxxxx>; Jakub Kicinski
> >>> <jakub.kicinski@xxxxxxxxxxxxx>
> >>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
> >>> linux- kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx;
> >>> davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko
> >>> <jiri@xxxxxxxxxxxx>
> >>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>> extension
> >>>
> >>> Hi Kirti,
> >>>
> >>>> -----Original Message-----
> >>>> From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
> >>>> Sent: Tuesday, March 5, 2019 4:40 PM
> >>>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
> >>>> <jakub.kicinski@xxxxxxxxxxxxx>
> >>>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
> >>>> linux- kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx;
> >>>> davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko
> >>>> <jiri@xxxxxxxxxxxx>
> >>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
> >>>> extension
> >>>>
> >>>>
> >>>>
> >>>>> I am novice at mdev level too. mdev or vfio mdev.
> >>>>> Currently by default we bind to same vendor driver, but when it
> >>>>> was
> >>>> created as passthrough device, vendor driver won't create netdevice
> >>>> or rdma device for it.
> >>>>> And vfio/mdev or whatever mature available driver would bind at
> >>>>> that
> >>>> point.
> >>>>>
> >>>>
> >>>> Using mdev framework, if you want to partition a physical device
> >>>> into multiple logic devices, you can bind those devices to same
> >>>> vendor driver through vfio-mdev, where as if you want to
> >>>> passthrough the device bind it to vfio-pci. If I understand
> >>>> correctly, that is what you are
> >>> looking for.
> >>>>
> >>>>
> >>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
> >>> PCI device has existing protocol devices on it such as netdevs and rdma
> dev.
> >>> This device is partitioned while those protocol devices exist and
> >>> mlx5_core, mlx5_ib drivers are loaded on it.
> >>> And we also need to connect these objects rightly to eswitch exposed
> >>> by devlink interface (net/core/devlink.c) that supports eswitch
> >>> binding, health, registers, parameters, ports support.
> >>> It also supports existing PCI VFs.
> >>>
> >>> I donât think we want to replicate all of this again in mdev subsystem [1].
> >>>
> >>> [1]
> >>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
> >>>
> >>> So devlink interface to migrate users from managing VFs to non_VF
> >>> sub device is natural progression.
> >>>
> >>> However, in future, I believe we would be creating mediated devices
> >>> on user request, to use mdev modules and map them to VM.
> >>>
> >>> Also 'mdev_bus' is created as a class and not as a bus. This limits
> >>> to not use devlink interface whose handle is bus+device name.
> >>>
> >>> So one option is to change mdev from class to bus.
> >>> devlink will create mdevs on the bus, mdev driver can probe these
> >>> devices on host system by default.
> >>> And if told to do passthrough, a different driver exposes them to VM.
> >>> How feasible is this?
> >>>
> >> Wait, I do see a mdev bus and mdevs are created on this bus using
> >> mdev_device_create().
> >> So how about we create mdevs on this bus using devlink, instead of sysfs?
> >> And driver side on host gets the mdev_register_driver()->probe()?
> >>
> >
> > Thinking more and reviewing more mdev code, I believe mdev fits this
> > need a lot better than new subdev bus, mfd, platform device, or devlink
> subport.
> > For coming future, to map this sub device (mdev) to VM will also be easier
> by using mdev bus.
> >
>
> Thanks for taking close look at mdev code.
>
> Assigning mdev to VM support is already in place, QEMU and libvirt have
> support to assign mdev device to VM.
>
> > I also believe we can use the sysfs interface for mdev life cycle.
> > Here when mdev are created it will register as devlink instance and
> > will be able to query/config parameters before driver probe the device.
> > (instead of having life cycle via devlink)
> >
> > Few enhancements would be needed for mdev side.
> > 1. making iommu optional.
>
> Currently mdev devices are not IOMMU aware, vendor driver is responsible
> for programming IOMMU for mdev device, if required.
> IOMMU aware mdev device patch set is almost reviewed and ready to get
> pulled. This is optional, vendor driver have to decide whether mdev device
> should be associated with its parents IOMMU or not. I'm testing it and I
> think Alex is on vacation and this will get pulled when Alex will be back from
> vacation.
> https://lwn.net/Articles/779650/
>
> > 2. configuring mdev device parameters during creation time
> >
>
> Mdev framework provides a way to define multiple types for creation
> through sysfs. You can define multiple types rather than having creation
> time parameter and on creation accordingly update 'available_instances'.
> Mdev also provides a way to provide vendor-specific-attributes for parent
> physical device as well as for created mdev device. You can add sysfs
> interface to get input parameters for a mdev device which can be used by
> vendor driver when open() on that mdev device is called.
>
> Thanks,
> Kirti
Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
Will wait for a day to receive more comments/views from Greg and others.
As I explained in this cover-letter and discussion,
First use case is to create and use mdevs in the host (and not in VM).
Later on, I am sure once we have mdevs available, VM users will likely use it.
So, mlx5_core driver will have two components as starting point.
1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
This is mdev device life cycle driver which will do, mdev_register_device() and implements mlx5_mdev_ops.
2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
This is mdev device driver which does mdev_register_driver()
and probe() creates netdev by heavily reusing existing code of the PF device.
These drivers will not be placed under drivers/vfio/mdev, because this is not a vfio driver.
This is fine, right?
Given that this is net driver, we will be submitting patches,
through netdev mailing list through Dave Miller's net-next tree.
And CC kvm@xxxxxxxxxxxxxxx, you and others as usual.
Are you ok, merging code this way as mdev device creator and mdev driver.
Yes?