RE: [RFC net-next 1/8] subdev: Introducing subdev bus

From: Parav Pandit
Date: Fri Mar 01 2019 - 11:36:01 EST


Hi Greg,

> -----Original Message-----
> From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> Sent: Friday, March 1, 2019 1:17 AM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> michal.lkml@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; Jiri Pirko
> <jiri@xxxxxxxxxxxx>
> Subject: Re: [RFC net-next 1/8] subdev: Introducing subdev bus
>
> On Thu, Feb 28, 2019 at 11:37:45PM -0600, Parav Pandit wrote:
> > Introduce a new subdev bus which holds sub devices created from a
> > primary device. These devices are named as 'subdev'.
> > A subdev is identified similarly to pci device using 16-bit vendor id
> > and device id.
> > Unlike PCI devices, scope of subdev is limited to Linux kernel.
>
> But these are limited to only PCI devices, right?
>
For Mellanox use case yes, its limited to PCI devices.

> This sounds a lot like that ARM proposal a week or so ago that asked for
> something like this, are you working with them to make sure your proposal
> works for them as well? (sorry, can't find where that was announced, it was
> online somewhere...)
>
We were not aware of it, mostly because we are either on net side of mailing lists (netdev, rdma, virt etc).
ARM proposal likely on linux-kernel, I guess.
I will lookup that proposal and surely see if both of us can use common infrastructure.

> > A central entry that assigns unique subdev vendor and device id is:
> > include/linux/subdev_ids.h enums. Enum are chosen over define macro so
> > that two vendors do not end up with vendor id in kernel development
> > process.
>
> Why not just make it dynamic with on static ids?
>
Can you please elaborate?
Do you mean we should use something similar to pci_add_dynid() with enhancement to catch duplicate id addition?

> > subdev bus holds subdevices of multiple devices. A typical created
> > subdev for a PCI device in sysfs tree appears under their parent's
> > device as using core's default device naming scheme:
> >
> > subdev<instance_id>.
> > i.e.
> > subdev0
> > subdev1
> >
> > $ ls -l /sys/bus/pci/devices/0000:05:00.0 [..]
> > drwxr-xr-x 4 root root 0 Feb 13 15:57 subvdev0
> > drwxr-xr-x 4 root root 0 Feb 13 15:57 subvdev1
> >
> > Device model view:
> > ------------------
> > +------+ +------+ +------+
> > |subdev| |subdev| |subdev|
> > -----| 1 |----| 2 |-------| 3 |----------
> > | +--|---+ +-|----+ +--|---+ |
> > --------|----------|---subdev bus--|--------------
> > | | |
> > +--+----+-----+ +---+---+
> > |pcidev | |pcidev |
> > -----| A |-----------------| B |----------
> > | +-------+ +-------+ |
> > -------------------pci bus------------------------
>
> To be clear, "subdev bus" is just a logical grouping, there is no physical
> backing "bus" here at all, right?
>
Yep. that's correct.

> What is going to "bind" to subdev devices? PCI drivers? Or new types of
> drivers?
>
Devices are placed on subdev bus using devlink interface. And drivers which registers using subdev_register_driver(), their probe() method will be called.
So yes, those are PCI vendor driver.
I tried to capture this in cover-letter.
At present users didn't ask to map this subdev to VM, but there is very high chance that once we have this without PCI SR-IOV, they would like to extend to VMs too.
So in that case devlink will have option to say, add 'passthrough' device, and in that case instead of vendor's pci driver, some high level vfio type driver will bind to it.
That is just the anticipation, but we haven't really worked out this fully.
But this model allows to do so.

> > subdev are allocated and freed using subdev_alloc(), subdev_free() APIs.
> > A driver which wants to create actual class driver such as
> > net/block/infiniband need to use subdev_register_driver(),
> > subdev_unregister_driver() APIs.
> >
> > +++ b/drivers/subdev/Kconfig
> > @@ -0,0 +1,12 @@
> > +#
> > +# subdev configuration
> > +#
> > +
> > +config SUBDEV
> > + tristate "subdev bus driver"
> > + help
> > + The subdev bus driver allows creating hardware based sub devices
> > + from a parent device. The subdev bus driver is required to create,
> > + discover devices and to attach device drivers to this subdev
> > + devices. These subdev devices are created using devlink tool by
> > + user.
>
>
> Your definition of the bus uses the name of the bus in the definition :)
>
I will better word this in v2 if we don't go mfd route.

> > diff --git a/drivers/subdev/Makefile b/drivers/subdev/Makefile new
> > file mode 100644 index 0000000..405b74a
> > --- /dev/null
> > +++ b/drivers/subdev/Makefile
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Makefile for subdev bus driver
> > +#
> > +
> > +obj-$(CONFIG_SUBDEV) += subdev.o
> > +
> > +subdev-y := subdev_main.o
> > diff --git a/drivers/subdev/subdev_main.c
> > b/drivers/subdev/subdev_main.c new file mode 100644 index
> > 0000000..4aabcaa
> > --- /dev/null
> > +++ b/drivers/subdev/subdev_main.c
> > @@ -0,0 +1,153 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/module.h>
> > +#include <linux/device.h>
> > +#include <linux/slab.h>
> > +#include <linux/subdev_bus.h>
> > +
> > +static DEFINE_XARRAY_FLAGS(subdev_ids, XA_FLAGS_ALLOC);
>
> Why not an idr?
>
Matt is running large patch series to get rid of idr/ida and replacing with xarray.
So instead of creating more work for him, thought to start with xarray from beginning.

> > +
> > +int __subdev_register_driver(struct subdev_driver *drv, struct module
> *owner,
> > + const char *mod_name)
> > +{
> > + drv->driver.name = mod_name;
> > + drv->driver.bus = &subdev_bus_type;
> > + drv->driver.owner = owner;
> > + drv->driver.mod_name = mod_name;
> > +
> > + return driver_register(&drv->driver); }
> > +EXPORT_SYMBOL(__subdev_register_driver);
>
> EXPORT_SYMBOL_GPL() for this and the other ones as you are just wrapping
> the driver core logic loosely.
>
I see. Yes. will fix this.

> > +/**
> > + * subdev_add_dev - Add a sub device to bus.
> > + * @subdev: subdev devie to be placed on the bus
> > + * @parent_dev: Parent device of the subdev
> > + * @vid: Vendor ID of the device
> > + * @did: Device ID of the device
> > + *
> > + * Returns 0 on successfully adding subdev to bus or error code on failure.
> > + * Once the device is added, it can be probed by the device driver
> > +who
> > + * wish to match it.
> > + *
> > + */
> > +int subdev_add_dev(struct subdev *subdev, struct device *parent_dev,
> > + enum subdev_vendor_id vid, enum subdev_device_id did) {
> > + u32 id = 0;
> > + int ret;
> > +
> > + if (!parent_dev)
> > + return -EINVAL;
>
> No root devices?
>
I didn't get the comment. Intent of this check is subdev must have parent. Parent type doesn't matter.

> > +
> > + ret = xa_alloc(&subdev_ids, &id, UINT_MAX, NULL, GFP_KERNEL);
>
> No locking needed?
>
Documentation at [1] describes that xa_alloc() and xa_erase() takes the lock internally.
[1] https://www.kernel.org/doc/html/latest/core-api/xarray.html

> > +module_init(subdev_init);
> > +module_exit(subdev_exit);
> > +
> > +MODULE_LICENSE("GPL");
>
> Nit, for a few more weeks, this needs to be "GPL v2".
>
Ok.

> > #endif /* LINUX_MOD_DEVICETABLE_H */
> > diff --git a/include/linux/subdev_bus.h b/include/linux/subdev_bus.h
> > new file mode 100644 index 0000000..c6410e3
> > --- /dev/null
> > +++ b/include/linux/subdev_bus.h
> > @@ -0,0 +1,63 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#ifndef SUBDEV_BUS_H
> > +#define SUBDEV_BUS_H
> > +
> > +#include <linux/mod_devicetable.h>
> > +#include <linux/device.h>
> > +#include <linux/subdev_ids.h>
> > +
> > +struct subdev_driver {
> > + const struct subdev_id *id_table;
> > + struct device_driver driver;
> > + struct list_head list;
> > +};
> > +
> > +#define to_subdev_driver(x) container_of(x, struct subdev_driver,
> > +driver)
> > +
> > +int __subdev_register_driver(struct subdev_driver *drv, struct module
> *owner,
> > + const char *mod_name);
> > +#define subdev_register_driver(driver) \
> > + __subdev_register_driver(driver, THIS_MODULE, KBUILD_MODNAME)
> > +
> > +void subdev_unregister_driver(struct subdev_driver *dev);
> > +
> > +/**
> > + * subdev - A subdev device representation
> > + *
> > + * @dev: device struct that represent subdev device in core device
> model
> > + * @dev_id: Unique vendor id, device id that subdev device drivers match
> > + * against. A unique id that defines this subdev assigned in
> > + * include/linux/subdev_ids.h
> > + */
> > +struct subdev {
> > + struct device dev;
> > + struct subdev_id dev_id;
> > +};
> > +
> > +#endif
> > diff --git a/include/linux/subdev_ids.h b/include/linux/subdev_ids.h
> > new file mode 100644 index 0000000..361faa3
> > --- /dev/null
> > +++ b/include/linux/subdev_ids.h
> > @@ -0,0 +1,17 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#ifndef SUBDEV_IDS_H
> > +#define SUBDEV_IDS_H
> > +
> > +enum subdev_vendor_id {
> > + SUBDEV_VENDOR_ID_MELLANOX,
> > +
> > + /* new device id must be added above at the end */
>
> Again, why ids at all?
>
Do you mean, we should just string or something to match on which driver to bind to the device?
Or do something similar to pci_add_dynid()?

> So far, this is just a very loose wrapping of the driver core bus functionality,
Right, it is wrapped to avoid creating just random device_driver and random device objects in device tree.

> which is fine, but I really don't see the goal here...

I see you have extended this question in mail where you ask about creating netdevices and using mfd.
So lets discuss in that context as it is more appropriate there. This patch is just code..