Re: [PATCH v2 net] net: phy: change devlink flag to AUTOREMOVE_SUPPLIER for non-SFP PHYs
From: Maxime Chevallier
Date: Mon Feb 02 2026 - 12:42:40 EST
On 02/02/2026 15:25, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>> Hi Wei,
>>
>> On 02/02/2026 06:45, Wei Fang wrote:
>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>> also be removed. Consequently, the MAC driver should not access the PHY
>>> device, otherwise, it will lead to some potential crashes. Because the
>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>> become invalid.
>>>
>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>> bus driver was removed first, followed by the MAC driver. The crash log
>>> is as below.
>>>
>>> Call trace:
>>> __list_del_entry_valid_or_report+0xa8/0xe0
>>> __device_link_del+0x40/0xf0
>>> device_link_put_kref+0xb4/0xc8
>>> device_link_del+0x38/0x58
>>> phy_detach+0x2c/0x170
>>> phy_disconnect+0x4c/0x70
>>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>> stmmac_release+0x60/0x358 [stmmac]
>>>
>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>> the below crash log and the console is hanged.
>>>
>>> Call trace:
>>> _phy_state_machine+0x230/0x36c (P)
>>> phy_stop+0x74/0x190
>>> phylink_stop+0x28/0xb8
>>> enetc_close+0x28/0x8c
>>> __dev_close_many+0xb4/0x1d8
>>> netif_close_many+0x8c/0x13c
>>> enetc4_pf_remove+0x2c/0x84
>>> pci_device_remove+0x44/0xe8
>>>
>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>> along with the PHY driver. However, the solution does not take into
>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>> is unplugged, the MAC driver will automatically be removed, which is not
>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>> for non-SFP PHYs.
>>>
>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@xxxxxxxxxxx>
>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@xxxxxxxxxxx/
>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@xxxxxxxxxxx/ # [1]
>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>> Suggested-by: Maxime Chevallier <maxime.chevallier@xxxxxxxxxxx>
>>> Signed-off-by: Wei Fang <wei.fang@xxxxxxx>
>>
>> I gave that patch a test, with the following cases :
>>
>> - On Macchiatobin (we have PHYs that share an mdiobus).
>> When unbinding a PHY, the MAC dissapears as well :
>
> Correct, this is why these band-aids are harmful. One "device" can
> correspond with *multiple* network interfaces, and the loss of one
> PHY can have a *very* detrimental effect.
>
> Consider the case where root-NFS is being used, and removing a PHY
> on another interface takes out the interface that root-NFS is
> using. Your machine is now dead in the water.
That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
and I don't have log regarding "why". I guess there's devlink debug
knobs for that, but not enabled by default it seems.
However, we seem to have the issue even without this patch.
On MCBin, if I unbind eth1 for example, all 3 interfaces that are on CP1
are gone :
cd /sys/class/net/eth1/device/driver
echo f4000000.ethernet > unbind
only eth0 is now left. This is on net-next/main :(
For Wei's case where unbinding netdev 1 brings the mdio bus down, used
by PHY on netdev 2, we'd be also dead in the water as well no matter
what as well no ?
> In my opinion, we should be concentrating more on the issue behind
> the oops.
>
> Given that this problem is because of the bus being removed, one
> thing that would help would be for the MDIO bus to be properly
> refcounted, and when the bus is unbound, to replace the bus ops
> with versions that return -ENXIO or similar under the MII bus
> lock. This would be easier of the MDIO bus ops were a separate struct
> to struct mii_bus.
>
> Similar with the PHY itself - if the PHY is in-use, it should be
> refcounted to stop the struct phy_device from going away, and
> should we have the situation where the PHY driver is unbound,
> phydev->drv should be set to a set of dummy ops (under the phydev
> mutex and probably rtnl.)
>
> It seems to me that throwing devlinks at this problem is giving us
> more problems than it's solving.
>
> A graceful way to handle a MAC losing its PHY is for phylib to
> indicate that the PHY has gone down, rather than removing the
> network interface (and potentially a whole host of other network
> interfaces in the case of one struct device being associated
> with many interfaces.)
>
Agreed, that's quite the can of worms though I suspect :(
Maxime