Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
From: Russell King (Oracle)
Date: Tue Feb 17 2026 - 10:48:54 EST
On Tue, Feb 17, 2026 at 02:51:25PM +0100, Andrew Lunn wrote:
> On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> > On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > > There is a crash when unbinding the sja1105 driver under special
> > > circumstances:
> > >
> > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > > Call trace:
> > > phylink_run_resolve_and_disable+0x10/0x90
> > > sja1105_static_config_reload+0xc0/0x410
> > > sja1105_vlan_filtering+0x100/0x140
> > > dsa_port_vlan_filtering+0x13c/0x368
> > > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > > dsa_port_bridge_leave+0x130/0x248
> > > dsa_user_changeupper.part.0+0x74/0x158
> > > dsa_user_netdevice_event+0x50c/0xa50
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > __netdev_upper_dev_unlink+0xac/0x220
> > > netdev_upper_dev_unlink+0x38/0x70
> > > del_nbp+0x1a4/0x320
> > > br_del_if+0x3c/0xd8
> > > br_device_event+0xf8/0x2d8
> > > notifier_call_chain+0x78/0x148
> > > raw_notifier_call_chain+0x20/0x38
> > > call_netdevice_notifiers_info+0x58/0xa8
> > > unregister_netdevice_many_notify+0x314/0x848
> > > unregister_netdevice_queue+0xe8/0xf8
> > > dsa_user_destroy+0x50/0xa8
> > > dsa_port_teardown+0x80/0x98
> > > dsa_switch_teardown_ports+0x4c/0xb8
> > > dsa_switch_deinit+0x94/0xb8
> > > dsa_switch_put_tree+0x2c/0xc0
> > > dsa_unregister_switch+0x38/0x60
> > > sja1105_remove+0x24/0x40
> > > spi_remove+0x38/0x60
> > > device_remove+0x54/0x90
> > > device_release_driver_internal+0x1d4/0x230
> > > device_driver_detach+0x20/0x38
> > > unbind_store+0xbc/0xc8
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > which requires an explanation.
> > >
> > > When a port offloads a bridge, the switch must be reset to change
> > > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > > bridge, it must also be reset for the same reason: it is returning
> > > to operation as a VLAN-unaware standalone port.
> > >
> > > sja1105_static_config_reload() triggers the phylink link replay helpers.
> > >
> > > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > > dsa_port_teardown() -> dsa_user_destroy().
> > >
> > > The crash happens when the numerically first user port is not part of
> > > the VLAN-aware bridge, but any other user port is.
> > >
> > > Tearing down the first user port causes phylink_destroy() to be called
> > > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > > indirectly it will trigger sja1105_static_config_reload().
> > >
> > > The latter function iterates using dsa_switch_for_each_available_port(),
> > > and unconditionally dereferences dp->pl, including for the
> > > aforementioned torn down previous port, and passes that to phylink.
> > > This is where the NULL pointer is coming from.
> > >
> > > There are multiple levels at which this could be avoided:
> > > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > > - make the phylink replay helpers NULL-tolerant
> > > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > > has run, such that subsequent dsa_switch_for_each_available_port()
> > > iterations skip them
> > > - disconnect the entire switch at once from switchdev and
> > > NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > > likely using a "ds->unbinding = true" mechanism or similar
> > >
> > > however options 3 and 4 are quite heavy and might have side effects,
> > > option 1 is very unassuming and option 2 seems a more elegant variant
> > > of 1, given the fact that sja1105 is the only user of these phylink
> > > replay helpers. It allows to keep the driver simple and is the option
> > > I went with.
> > >
> > > Functionally speaking, transforming the replay helpers into no-ops for
> > > ports without a phylink instance is fine, because that only happens
> > > during driver removal (an operation which cannot be cancelled). The
> > > ports are not required to work.
> > >
> > > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > > Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
> >
> > I think this patch could land on current net, but it would be nice an
> > ack from phylib SMEs.
>
> Sorry, weekend away.
>
> I prefer option 1. I _think_ option 2 only works because the MAC
> driver set dp->pl to NULL. phylink is not responsible for the NULL, so
> it seems odd for phylink to assume there is a NULL. Only the MAC
> driver knows if the MAC driver has set dp->pl to NULL.
I also prefer option 1. Currently, none of phylink's driver API permits
struct phylink to be NULL, and I'd prefer that to remain the case for
consistency.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!