Re: [net-next PATCH 3/6] net: phylink: Correctly handle PCS probe defer from PCS provider

From: Christian Marangi
Date: Wed Mar 19 2025 - 13:35:38 EST


On Wed, Mar 19, 2025 at 05:02:50PM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 19, 2025 at 05:18:50PM +0100, Christian Marangi wrote:
> > > > linkmode_fill(pl->supported);
> > > > linkmode_copy(pl->link_config.advertising, pl->supported);
> > > > - phylink_validate(pl, pl->supported, &pl->link_config);
> > > > + ret = phylink_validate(pl, pl->supported, &pl->link_config);
> > > > + /* The PCS might not available at the time phylink_create
> > > > + * is called. Check this and communicate to the MAC driver
> > > > + * that probe should be retried later.
> > > > + *
> > > > + * Notice that this can only happen in probe stage and PCS
> > > > + * is expected to be avaialble in phylink_major_config.
> > > > + */
> > > > + if (ret == -EPROBE_DEFER) {
> > > > + kfree(pl);
> > > > + return ERR_PTR(ret);
> > > > + }
> > >
> > > This does not solve the problem - what if the interface mode is
> > > currently not one that requires a PCS that may not yet be probed?
> >
> > Mhhh but what are the actual real world scenario for this? If a MAC
> > needs a dedicated PCS to handle multiple mode then it will probably
> > follow this new implementation and register as a provider.
> >
> > An option to handle your corner case might be an OP that wait for each
> > supported interface by the MAC and make sure there is a possible PCS for
> > it. And Ideally place it in the codeflow of validate_pcs ?
>
> I think you've fallen in to the trap of the stupid drivers that
> implement mac_select_pcs() as:
>
> static struct phylink_pcs *foo_mac_select_pcs(struct phylink_config *config,
> phy_interface_t interface)
> {
> struct foo_private *priv = phylink_to_foo(config);
>
> return priv->pcs;
> }
>
> but what drivers can (and should) be doing is looking at the interface
> argument, and working out which interface to return.
>
> Phylink is not designed to be single interface mode, single PCS driver
> despite what many MAC drivers do. Checking the phylink_validate()
> return code doesn't mean that all PCS exist for the MAC.
>
> > > I don't like the idea that mac_select_pcs() might be doing a complex
> > > lookup - that could make scanning the interface modes (as
> > > phylink_validate_mask() does) quite slow and unreliable, and phylink
> > > currently assumes that a PCS that is validated as present will remain
> > > present.
> >
> > The assumption "will remain present" is already very fragile with the
> > current PCS so I feel this should be changed or improved. Honestly every
> > PCS currently implemented can be removed and phylink will stay in an
> > undefined state.
>
> The fragility is because of the way networking works - there's nothing
> phylink can do about this.
>
> I take issue with "every PCS currently implemented" because it's
> actually not a correct statement.
>
> XPCS as used by stmmac does not fall into this.
> The PCS used by mvneta and mvpp2 do not fall into this.
> The PCS used by the Marvell DSA driver do not fall into this.
>
> It's only relatively recently with pcs-lynx and others that people have
> wanted them to be separate driver-model devices that this problem has
> occurred, and I've been pushing back on it saying we need to find a
> proper solution to it. I really haven't liked that we've merged drivers
> that cause this fragility without addressing that fragility.
>
> I've got to the point where I'm now saying no to new drivers that fail
> to address this, so we're at a crunch time when it needs to be
> addressed.
>
> We need to think about how to get around this fragility. The need to
> pre-validate the link modes comes from the netdev ethtool user
> interface itself - the need to tell userspace what link modes can be
> supported _before_ they get used. This API hasn't been designed with
> the idea that parts of a netdev might vanish at any particular time.
>
> > > If it goes away by the time phylink_major_config() is called, then we
> > > leave the phylink state no longer reflecting how the hardware is
> > > programmed, but we still continue to call mac_link_up() - which should
> > > probably be fixed.
> >
> > Again, the idea to prevent these kind of chicken-egg problem is to
> > enforce correct removal on the PCS driver side.
> >
> > > Given that netdev is severely backlogged, I'm not inclined to add to
> > > the netdev maintainers workloads by trying to fix this until after
> > > the merge window - it looks like they're at least one week behind.
> > > Consequently, I'm expecting that most patches that have been
> > > submitted during this week will be dropped from patchwork, which
> > > means submitting patches this week is likely not useful.
> >
> > Ok I will send next revision as RFC to not increase the "load" but IMHO
> > it's worth to discuss this... I really feel we need to fix the PCS
> > situation ASAP or more driver will come. (there are already 3 in queue
> > as stressed in the cover letter)
>
> Yes, we do need to fix it, but we need to recognise _all_ the issues
> it creates by doing this, and how we handle it properly.
>
> Right now, it's up to the MAC driver to get all the PCS it needs
> during its probe function, and *not* in the mac_select_pcs() method
> which has no way to propagate an error to anywhere sensible that
> could handle an EPROBE_DEFER response.
>
> My thoughts are that if a PCS goes away after a MAC driver has "got"
> it, then:
>
> 1. we need to recognise that those PHY interfaces and/or link modes
> are no longer available.
> 2. if the PCS was in-use, then the link needs to be taken down at
> minimum and the .pcs_disable() method needs to be called to
> release any resources that .pcs_enable() enabled (e.g. irq masks,
> power enables, etc.)
> 3. the MAC driver needs to be notified that the PCS pointer it
> stashed is no longer valid, so it doesn't return it for
> mac_select_pcs().

But why we need all these indirect handling and checks if we can
make use of .remove and shutdown the interface. A removal of a PCS
should cause the entire link to go down, isn't a dev_close enough to
propagate this? If and when the interface will came up checks are done
again and it will fail to go UP if PCS can't be found.

I know it's a drastic approach to call dev_close but link is down anyway
so lets reinit everything from scratch. It should handle point 2 and 3
right?

For point 1, additional entry like available_interface? And gets updated
once a PCS gets removed??? Or if we don't like the parsing hell we map
every interface to a PCS pointer? (not worth the wasted space IMHO)

>
> There's probably a bunch more that needs to happen, and maybe need
> to consider how to deal with "pcs came back".. but I haven't thought
> that through yet.
>

Current approach supports PCS came back as we check the global provider
list and the PCS is reachable again there.
(we tasted various scenario with unbind/bind while the interface was
up/down)

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

--
Ansuel