Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

From: Kirti Wankhede
Date: Tue Mar 05 2019 - 22:52:31 EST




On 3/6/2019 6:14 AM, Parav Pandit wrote:
> Hi Greg, Kirti,
>
>> -----Original Message-----
>> From: Parav Pandit
>> Sent: Tuesday, March 5, 2019 5:45 PM
>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Kirti Wankhede
>> <kwankhede@xxxxxxxxxx>; Jakub Kicinski <jakub.kicinski@xxxxxxxxxxxxx>
>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
>> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
>> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>>> -----Original Message-----
>>> From: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-
>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Parav Pandit
>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>> To: Kirti Wankhede <kwankhede@xxxxxxxxxx>; Jakub Kicinski
>>> <jakub.kicinski@xxxxxxxxxxxxx>
>>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
>>> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
>>> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>
>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>> extension
>>>
>>> Hi Kirti,
>>>
>>>> -----Original Message-----
>>>> From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
>>>> <jakub.kicinski@xxxxxxxxxxxxx>
>>>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx;
>>>> linux- kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx;
>>>> davem@xxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko
>>>> <jiri@xxxxxxxxxxxx>
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>> Currently by default we bind to same vendor driver, but when it
>>>>> was
>>>> created as passthrough device, vendor driver won't create netdevice
>>>> or rdma device for it.
>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>> that
>>>> point.
>>>>>
>>>>
>>>> Using mdev framework, if you want to partition a physical device
>>>> into multiple logic devices, you can bind those devices to same
>>>> vendor driver through vfio-mdev, where as if you want to passthrough
>>>> the device bind it to vfio-pci. If I understand correctly, that is
>>>> what you are
>>> looking for.
>>>>
>>>>
>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
>>> device has existing protocol devices on it such as netdevs and rdma dev.
>>> This device is partitioned while those protocol devices exist and
>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>> And we also need to connect these objects rightly to eswitch exposed
>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>> binding, health, registers, parameters, ports support.
>>> It also supports existing PCI VFs.
>>>
>>> I donât think we want to replicate all of this again in mdev subsystem [1].
>>>
>>> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>
>>> So devlink interface to migrate users from managing VFs to non_VF sub
>>> device is natural progression.
>>>
>>> However, in future, I believe we would be creating mediated devices on
>>> user request, to use mdev modules and map them to VM.
>>>
>>> Also 'mdev_bus' is created as a class and not as a bus. This limits to
>>> not use devlink interface whose handle is bus+device name.
>>>
>>> So one option is to change mdev from class to bus.
>>> devlink will create mdevs on the bus, mdev driver can probe these
>>> devices on host system by default.
>>> And if told to do passthrough, a different driver exposes them to VM.
>>> How feasible is this?
>>>
>> Wait, I do see a mdev bus and mdevs are created on this bus using
>> mdev_device_create().
>> So how about we create mdevs on this bus using devlink, instead of sysfs?
>> And driver side on host gets the mdev_register_driver()->probe()?
>>
>
> Thinking more and reviewing more mdev code, I believe mdev fits
> this need a lot better than new subdev bus, mfd, platform device, or devlink subport.
> For coming future, to map this sub device (mdev) to VM will also be easier by using mdev bus.
>

Thanks for taking close look at mdev code.

Assigning mdev to VM support is already in place, QEMU and libvirt have
support to assign mdev device to VM.

> I also believe we can use the sysfs interface for mdev life cycle.
> Here when mdev are created it will register as devlink instance and
> will be able to query/config parameters before driver probe the device.
> (instead of having life cycle via devlink)
>
> Few enhancements would be needed for mdev side.
> 1. making iommu optional.

Currently mdev devices are not IOMMU aware, vendor driver is responsible
for programming IOMMU for mdev device, if required.
IOMMU aware mdev device patch set is almost reviewed and ready to get
pulled. This is optional, vendor driver have to decide whether mdev
device should be associated with its parents IOMMU or not. I'm testing
it and I think Alex is on vacation and this will get pulled when Alex
will be back from vacation.
https://lwn.net/Articles/779650/

> 2. configuring mdev device parameters during creation time
>

Mdev framework provides a way to define multiple types for creation
through sysfs. You can define multiple types rather than having creation
time parameter and on creation accordingly update 'available_instances'.
Mdev also provides a way to provide vendor-specific-attributes for
parent physical device as well as for created mdev device. You can add
sysfs interface to get input parameters for a mdev device which can be
used by vendor driver when open() on that mdev device is called.

Thanks,
Kirti