Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

From: Kirti Wankhede
Date: Thu Mar 07 2019 - 16:07:58 EST




On 3/8/2019 2:32 AM, Parav Pandit wrote:
>
>
>> -----Original Message-----
>> From: Kirti Wankhede <kwankhede@xxxxxxxxxx>
>> Sent: Thursday, March 7, 2019 2:54 PM
>> To: Parav Pandit <parav@xxxxxxxxxxxx>; Jakub Kicinski
>> <jakub.kicinski@xxxxxxxxxxxxx>
>> Cc: Or Gerlitz <gerlitz.or@xxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
>> kernel@xxxxxxxxxxxxxxx; michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx;
>> gregkh@xxxxxxxxxxxxxxxxxxx; Jiri Pirko <jiri@xxxxxxxxxxxx>; Alex
>> Williamson <alex.williamson@xxxxxxxxxx>
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> <snip>
>>
>>>>>
>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
>>>>> Will wait for a day to receive more comments/views from Greg and
>> others.
>>>>>
>>>>> As I explained in this cover-letter and discussion, First use case
>>>>> is to create and use mdevs in the host (and not in VM).
>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>> likely use
>>>> it.
>>>>>
>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>
>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>> This is mdev device life cycle driver which will do,
>>>>> mdev_register_device()
>>>> and implements mlx5_mdev_ops.
>>>>>
>>>> Ok. I would suggest not use mdev.c file name, may be add device name,
>>>> something like mlx_mdev.c or vfio_mlx.c
>>>>
>>> mlx5/core is coding convention is not following to prefix mlx to its 40+
>> files.
>>>
>>> it uses actual subsystem or functionality name, such as, sriov.c
>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
>>> rest of the 40+ files.
>>>
>>>
>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>> probe() creates netdev by heavily reusing existing code of the PF device.
>>>>> These drivers will not be placed under drivers/vfio/mdev, because
>>>>> this is
>>>> not a vfio driver.
>>>>> This is fine, right?
>>>>>
>>>>
>>>> I'm not too familiar with netdev, but can you create netdev on open()
>>>> call on mlx mdev device? Then you don't have to write mdev device
>> driver.
>>>>
>>> Who invokes open() and release()?
>>> I believe it is the qemu would do open(), release, read/write/mmap?
>>>
>>> Assuming that is the case,
>>> I think its incorrect to create netdev in open.
>>> Because when we want to map the mdev to VM using above mdev calls, we
>> actually wont be creating netdev in host.
>>> Instead, some queues etc will be setup as part of these calls.
>>>
>>> By default this created mdev is bound to vfio_mdev.
>>> And once we unbind the device from this driver, we need to bind to mlx5
>> driver so that driver can create the netdev etc.
>>>
>>> Or did I get open() and friends call wrong?
>>>
>>
>> In 'struct mdev_parent_ops' there are create() and remove(). When user
>> creates mdev device by writing UUID to create sysfs, vendor driver's
>> create() callback gets called. This should be used to allocate/commit
> Yes. I am already past that stage.
>
>> resources from parent device and on remove() callback free those resources.
>> So there is no need to bind mlx5 driver to that mdev device.
>>
> If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver won't create netdev.

Doesn't need to.

Create netdev from create() callback.

Thanks,
Kirti

> Again, we do not want to map this mdev to a VM.
> We want to consume it in the host where mdev is created.
> So I am able to detach this mdev from vfio_mdev driver as usaual using
> $ echo mdev_name > ../drivers/vfio_mdev/unbind
>
> Followed by binding it to mlx5_core driver.
>
> Below is sample output before binding it to mlx5_core driver.
> When we bind with mlx5_core driver, that driver creates the netdev in host.
> If user wants to map this mdev to VM, user won't bind to mlx5_core driver. instead he will bind to vfio driver and that does usual open/release/...
>
>
> lrwxrwxrwx 1 root root 0 Mar 7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> ../../../devices/pci0000:00/0000:00:02.2/0000:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
> [root@sw-mtx-036 net-next]# ls -l /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
> total 0
> lrwxrwxrwx 1 root root 0 Mar 7 14:24 driver -> ../../../../../bus/mdev/drivers/vfio_mdev
> lrwxrwxrwx 1 root root 0 Mar 7 14:24 iommu_group -> ../../../../../kernel/iommu_groups/0
> lrwxrwxrwx 1 root root 0 Mar 7 14:24 mdev_type -> ../mdev_supported_types/mlx5_core-mgmt
> drwxr-xr-x 2 root root 0 Mar 7 14:24 power
> --w------- 1 root root 4096 Mar 7 14:24 remove
> lrwxrwxrwx 1 root root 0 Mar 7 14:24 subsystem -> ../../../../../bus/mdev
> -rw-r--r-- 1 root root 4096 Mar 7 14:24 uevent
>
>> open/release/read/write/mmap/ioctl are regular file operations for that
>> mdev device.
>>
>
>> Thanks,
>> Kirti
>