Re: [RFC net-next v2 1/2] devlink: add whole device devlink instance

From: Jiri Pirko
Date: Mon Feb 24 2025 - 11:14:52 EST


Wed, Feb 19, 2025 at 05:32:54PM +0100, przemyslaw.kitszel@xxxxxxxxx wrote:
>Add a support for whole device devlink instance. Intented as a entity
>over all PF devices on given physical device.
>
>In case of ice driver we have multiple PF devices (with their devlink
>dev representation), that have separate drivers loaded. However those
>still do share lots of resources due to being the on same HW. Examples
>include PTP clock and RSS LUT. Historically such stuff was assigned to
>PF0, but that was both not clear and not working well. Now such stuff
>is moved to be covered into struct ice_adapter, there is just one instance
>of such per HW.
>
>This patch adds a devlink instance that corresponds to that ice_adapter,
>to allow arbitrage over resources (as RSS LUT) via it (further in the
>series (RFC NOTE: stripped out so far)).
>
>Thanks to Wojciech Drewek for very nice naming of the devlink instance:
>PF0: pci/0000:00:18.0
>whole-dev: pci/0000:00:18
>But I made this a param for now (driver is free to pass just "whole-dev").
>
>$ devlink dev # (Interesting part of output only)
>pci/0000:af:00:
> nested_devlink:
> pci/0000:af:00.0
> pci/0000:af:00.1
> pci/0000:af:00.2
> pci/0000:af:00.3
> pci/0000:af:00.4
> pci/0000:af:00.5
> pci/0000:af:00.6
> pci/0000:af:00.7


In general, I like this approach. In fact, I have quite similar
patch/set in my sandbox git.

The problem I didn't figure out how to handle, was a backing entity
for the parent devlink.

You use part of PCI BDF, which is obviously wrong:
1) bus_name/dev_name the user expects to be the backing device bus and
address on it (pci/usb/i2c). With using part of BDF, you break this
assumption.
2) 2 PFs can have totally different BDF (in VM for example). Then your
approach is broken.

I was thinking about having an auxiliary device created for the parent,
but auxiliary assumes it is child. The is upside-down.

I was thinking about having some sort of made-up per-driver bus, like
"ice" of "mlx5" with some thing like DSN that would act as a "dev_name".
I have a patch that introduces:

struct devlink_shared_inst;

struct devlink *devlink_shared_alloc(const struct devlink_ops *ops,
size_t priv_size, struct net *net,
struct module *module, u64 per_module_id,
void *inst_priv,
struct devlink_shared_inst **p_inst);
void devlink_shared_free(struct devlink *devlink,
struct devlink_shared_inst *inst);

I took a stab at it here:
https://github.com/jpirko/linux_mlxsw/commits/wip_dl_pfs_parent/
The work is not finished.


Also, I was thinking about having some made-up bus, like "pci_ids",
where instead of BDFs as addresses, there would be DSN for example.

None of these 3 is nice.

The shared parent entity for PFs (and other Fs) is always reference
counted, first creates, last removes. I feel like this is something
missing in PCI spec. If such beast would exist, very easy to implement
this in devlink. We have all we need in place already.