Re: [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs

From: Russell King (Oracle)

Date: Mon Feb 02 2026 - 09:33:15 EST


On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
> Hi Wei,
>
> On 02/02/2026 06:45, Wei Fang wrote:
> > For the shared MDIO bus use case, multiple MACs will share the same MDIO
> > bus. Therefore, these MACs all depend on this MDIO bus. If this shared
> > MDIO bus is removed, all the PHY devices attached to this MDIO bus will
> > also be removed. Consequently, the MAC driver should not access the PHY
> > device, otherwise, it will lead to some potential crashes. Because the
> > corresponding phydev and the mii_bus have been freed, some pointers have
> > become invalid.
> >
> > For example. Abhishek reported a crash issue that occurred if the MDIO
> > bus driver was removed first, followed by the MAC driver. The crash log
> > is as below.
> >
> > Call trace:
> > __list_del_entry_valid_or_report+0xa8/0xe0
> > __device_link_del+0x40/0xf0
> > device_link_put_kref+0xb4/0xc8
> > device_link_del+0x38/0x58
> > phy_detach+0x2c/0x170
> > phy_disconnect+0x4c/0x70
> > phylink_disconnect_phy+0x6c/0xc0 [phylink]
> > stmmac_release+0x60/0x358 [stmmac]
> >
> > Another example is the i.MX95-15x15 platform which has two ENETC ports.
> > When all the external PHYs are managed the EMDIO (the MDIO controller),
> > if the enetc driver is removed after the EMDIO driver. Users will see
> > the below crash log and the console is hanged.
> >
> > Call trace:
> > _phy_state_machine+0x230/0x36c (P)
> > phy_stop+0x74/0x190
> > phylink_stop+0x28/0xb8
> > enetc_close+0x28/0x8c
> > __dev_close_many+0xb4/0x1d8
> > netif_close_many+0x8c/0x13c
> > enetc4_pf_remove+0x2c/0x84
> > pci_device_remove+0x44/0xe8
> >
> > To address this issue, Sarosh Hasan tried to change the devlink flag to
> > DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
> > along with the PHY driver. However, the solution does not take into
> > account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
> > is unplugged, the MAC driver will automatically be removed, which is not
> > the expected behavior. This issue should not exist for SFP PHYs, so based
> > on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
> > for non-SFP PHYs.
> >
> > Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@xxxxxxxxxxx>
> > Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@xxxxxxxxxxx/
> > Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@xxxxxxxxxxx/ # [1]
> > Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> > Suggested-by: Maxime Chevallier <maxime.chevallier@xxxxxxxxxxx>
> > Signed-off-by: Wei Fang <wei.fang@xxxxxxx>
>
> I gave that patch a test, with the following cases :
>
> - On Macchiatobin (we have PHYs that share an mdiobus).
> When unbinding a PHY, the MAC dissapears as well :

Correct, this is why these band-aids are harmful. One "device" can
correspond with *multiple* network interfaces, and the loss of one
PHY can have a *very* detrimental effect.

Consider the case where root-NFS is being used, and removing a PHY
on another interface takes out the interface that root-NFS is
using. Your machine is now dead in the water.

In my opinion, we should be concentrating more on the issue behind
the oops.

Given that this problem is because of the bus being removed, one
thing that would help would be for the MDIO bus to be properly
refcounted, and when the bus is unbound, to replace the bus ops
with versions that return -ENXIO or similar under the MII bus
lock. This would be easier of the MDIO bus ops were a separate struct
to struct mii_bus.

Similar with the PHY itself - if the PHY is in-use, it should be
refcounted to stop the struct phy_device from going away, and
should we have the situation where the PHY driver is unbound,
phydev->drv should be set to a set of dummy ops (under the phydev
mutex and probably rtnl.)

It seems to me that throwing devlinks at this problem is giving us
more problems than it's solving.

A graceful way to handle a MAC losing its PHY is for phylib to
indicate that the PHY has gone down, rather than removing the
network interface (and potentially a whole host of other network
interfaces in the case of one struct device being associated
with many interfaces.)

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!