Re: [net-next PATCH 0/6] net: pcs: Introduce support for PCS OF
From: Christian Marangi
Date: Wed Mar 19 2025 - 13:45:07 EST
On Wed, Mar 19, 2025 at 05:29:34PM +0000, Russell King (Oracle) wrote:
> On Wed, Mar 19, 2025 at 12:58:36AM +0100, Christian Marangi wrote:
> > A PCS provider have to implement and call of_pcs_add_provider() in
> > probe function and define an xlate function to define how the PCS
> > should be provided based on the requested interface and phandle spec
> > defined in DT (based on the #pcs-cells)
> >
> > of_pcs_get() is provided to provide a specific PCS declared in DT
> > an index.
> >
> > A simple xlate function is provided for simple single PCS
> > implementation, of_pcs_simple_get.
> >
> > A PCS provider on driver removal should first call
> > phylink_pcs_release() to release the PCS from phylink and then
> > delete itself as a provider with of_pcs_del_provider() helper.
>
> This is inherently racy.
>
> phylink_pcs_release() may release the PCS from phylink, but there is a
> window between calling this and of_pcs_del_provider() where it could
> still be "got".
Ah I hoped the rtnl lock would protect from this. I need to check if
unpublish first cause any harm, in theory no as phylink should still
have his pointer to the pcs struct and select_pcs should not be called
on interface removal.
But this can also be handled by flagging a PCS as "to-be-removed" so
that it gets ignored if in this window it gets parsed by the PCS
provider API.
>
> The sequence always has to be:
>
> First, unpublish to prevent new uses.
> Then remove from current uses.
> Then disable hardware/remove resources.
>
> It makes me exceedingly sad that we make keep implementing the same
> mistakes time and time again - it was brought up at one of the OLS
> conferences back in the 2000s, probably around the time that the
> driver model was just becoming "a thing". At least I can pass on
> this knowledge when I spot it and help others to improve!
>
> Note that networking's unregister_netdev() recognises this pattern,
> and unregister_netdev() will first unpublish the interface thereby
> making it inaccessible to be brought up, then take the interface down
> if it were up before returning - thus guaranteeing that when the
> function returns, it is safe to dispose of any and all resources that
> the driver was using.
>
> Sorry as I seem to be labouring this point.
>
No problem, some subsystem are so complex that these info are pure gold
and gets lots after years, especially with global lock in place where
someone can think they are enough to handle everything.
--
Ansuel