RE: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to subdev devices

From: Parav Pandit
Date: Fri Mar 01 2019 - 12:21:24 EST




> -----Original Message-----
> From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> Sent: Friday, March 1, 2019 1:22 AM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; Jiri Pirko
> <jiri@xxxxxxxxxxxx>
> Subject: Re: [RFC net-next 8/8] net/mlx5: Add subdev driver to bind to
> subdev devices
>
> On Thu, Feb 28, 2019 at 11:37:52PM -0600, Parav Pandit wrote:
> > Add a subdev driver to probe the subdev devices and create fake
> > netdevice for it.
>
> So I'm guessing here is the "meat" of the whole goal here?
>
> You just want multiple netdevices per PCI device? Why can't you do that
> today in your PCI driver?
>
Yes, but it just not multiple netdevices.
Let me please elaborate in detail.

There is a swichdev mode of a PCI function for netdevices.
In this mode a given netdev has additional control netdev (called representor netdevice = rep-ndev).
This rep-ndev is attached to OVS for adding rules, offloads etc using standard tc, netfilter infra.
Currently this rep-ndev controls switch side of the settings, but not the host side of netdev.
So there is discussion to create another netdev or devlink port..

Additionally this subdev has optional rdma device too.

And when we are in switchdev mode, this rdma dev has similar rdma rep device for control.

In some cases we actually don't create netdev when it is in InfiniBand mode.
Here there is PCI device->rdma_device.

In other case, a given sub device for rdma is dual port device, having netdevice for each that can use existing netdev->dev_port.

Creating 4 devices of two different classes using one iproute2/ip or iproute2/rdma command is horrible thing to do.

In case if this sub device has to be a passthrough device, ip link command will fail badly that day, because we are creating some sub device which is not even a netdevice.

So iproute2/devlink which works on bus+device, mainly PCI today, seems right abstraction point to create sub devices.
This also extends to map ports of the device, health, registers debug, etc rich infrastructure that is already built.

Additionally, we don't want mlx driver and other drivers to go through its child devices (split logic in netdev and rdma) for power management.
Kernel core code does that well today, that we like to leverage through subdev bus or mfd pm callbacks.

So it is lot more than just creating netdevices.

> What problem are you trying to solve that others also are having that
> requires all of this?
>
> Adding a new bus type and subsystem is fine, but usually we want more
> than just one user of it, as this does not really show how it is exercised very
> well.
This subdev and devlink infrastructure solves this problem of creating smaller sub devices out of one PCI device.
Someone has to start.. :-)

To my knowledge, currently Netronome, Broadcom and Mellanox are actively using this devlink and switchdev infra today.
I added Jakub from Netronome, he is in netdev mailing list, but added in CC, to listen his feedback.

> Ideally 3 users would be there as that is when it proves itself that it is
> flexible enough.
>

We were looking at drivers/visorbus if we can repurpose it, but GUID device naming scheme is just not user friendly.
It has only single s-Par user and whose guest drivers are still in staging for more than a year now. So doesn't really fit well.

> Would just using the mfd subsystem work better for you? That provides
> core support for "multi-function" drivers/devices already. What is missing
> from that subsystem that does not work for you here?
>
We were not aware of mfd until now. I looked at very high level now. It's a wrapper to platform devices and seems widely use.
Before subdev proposal, Jason suggested an alternative is to create platform devices and driver attach to it.

When I read kernel documentation [1], it says "platform devices typically appear as autonomous entities"
Here instead of autonomy, it is in user's control.
Platform devices probably don't disappear a lot in live system as opposed to subdevices which are created and removed dynamically a lot often.

Not sure if platform device is abuse for this purpose or not.
So which direction to go, devlink->mfd(platform wrapper) or devlink->subdev would be obviously a huge blessing.

[1] https://www.kernel.org/doc/Documentation/driver-model/platform.txt