Re: [RFC PATCH net-next 3/9] net: pcs: Add helpers for registering and finding PCSs

From: Russell King (Oracle)
Date: Mon Jul 11 2022 - 16:59:24 EST

Hi Sean,

It's a good attempt and may be nice to have, but I'm afraid the
implementation has a flaw to do with the lifetime of data structures
which always becomes a problem when we have multiple devices being
used in aggregate.

On Mon, Jul 11, 2022 at 12:05:13PM -0400, Sean Anderson wrote:
> +/**
> + * pcs_get_tail() - Finish getting a PCS
> + * @pcs: The PCS to get, or %NULL if one could not be found
> + *
> + * This performs common operations necessary when getting a PCS (chiefly
> + * incrementing reference counts)
> + *
> + * Return: @pcs, or an error pointer on failure
> + */
> +static struct phylink_pcs *pcs_get_tail(struct phylink_pcs *pcs)
> +{
> + if (!pcs)
> + return ERR_PTR(-EPROBE_DEFER);
> +
> + if (!try_module_get(pcs->ops->owner))
> + return ERR_PTR(-ENODEV);

What you're trying to prevent here is the PCS going away - but holding a
reference to the module doesn't prevent that with the driver model. The
driver model design is such that a device can be unbound from its driver
at any moment. Taking a reference to the module doesn't prevent that,
all it does is ensure that the user can't remove the module. It doesn't
mean that the "pcs" structure will remain allocated.

The second issue that this creates is if a MAC driver creates the PCS
and then "gets" it through this interface, then the MAC driver module
ends up being locked in until the MAC driver devices are all unbound,
which isn't friendly at all.

So, anything that proposes to create a new subsystem where we have
multiple devices that make up an aggregate device needs to nicely cope
with any of those devices going away. For that to happen in this
instance, phylink would need to know that its in-use PCS for a
particular MAC is going away, then it could force the link down before
removing all references to the PCS device.

Another solution would be devlinks, but I am really not a fan of that
when there may be a single struct device backing multiple network
interfaces, where some of them may require PCS and others do not. One
wouldn't want the network interface with nfs-root to suddenly go away
because a PCS was unbound from its driver!

> + get_device(pcs->dev);

This helps, but not enough. All it means is the struct device won't
go away, the "pcs" can still go away if the device is unbound from the

RMK's Patch system:
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!