Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance

From: Andrew Lunn

Date: Tue Feb 17 2026 - 08:52:32 EST


On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:
> On 2/5/26 8:23 PM, Vladimir Oltean wrote:
> > There is a crash when unbinding the sja1105 driver under special
> > circumstances:
> >
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > Call trace:
> > phylink_run_resolve_and_disable+0x10/0x90
> > sja1105_static_config_reload+0xc0/0x410
> > sja1105_vlan_filtering+0x100/0x140
> > dsa_port_vlan_filtering+0x13c/0x368
> > dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198
> > dsa_port_bridge_leave+0x130/0x248
> > dsa_user_changeupper.part.0+0x74/0x158
> > dsa_user_netdevice_event+0x50c/0xa50
> > notifier_call_chain+0x78/0x148
> > raw_notifier_call_chain+0x20/0x38
> > call_netdevice_notifiers_info+0x58/0xa8
> > __netdev_upper_dev_unlink+0xac/0x220
> > netdev_upper_dev_unlink+0x38/0x70
> > del_nbp+0x1a4/0x320
> > br_del_if+0x3c/0xd8
> > br_device_event+0xf8/0x2d8
> > notifier_call_chain+0x78/0x148
> > raw_notifier_call_chain+0x20/0x38
> > call_netdevice_notifiers_info+0x58/0xa8
> > unregister_netdevice_many_notify+0x314/0x848
> > unregister_netdevice_queue+0xe8/0xf8
> > dsa_user_destroy+0x50/0xa8
> > dsa_port_teardown+0x80/0x98
> > dsa_switch_teardown_ports+0x4c/0xb8
> > dsa_switch_deinit+0x94/0xb8
> > dsa_switch_put_tree+0x2c/0xc0
> > dsa_unregister_switch+0x38/0x60
> > sja1105_remove+0x24/0x40
> > spi_remove+0x38/0x60
> > device_remove+0x54/0x90
> > device_release_driver_internal+0x1d4/0x230
> > device_driver_detach+0x20/0x38
> > unbind_store+0xbc/0xc8
> > ---[ end trace 0000000000000000 ]---
> >
> > which requires an explanation.
> >
> > When a port offloads a bridge, the switch must be reset to change
> > the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for
> > sja1105_static_config_reload()). When the port leaves a VLAN-aware
> > bridge, it must also be reset for the same reason: it is returning
> > to operation as a VLAN-unaware standalone port.
> >
> > sja1105_static_config_reload() triggers the phylink link replay helpers.
> >
> > Because sja1105 is a switch, it has multiple user ports. During unbind,
> > ports are torn down one by one in dsa_switch_teardown_ports() ->
> > dsa_port_teardown() -> dsa_user_destroy().
> >
> > The crash happens when the numerically first user port is not part of
> > the VLAN-aware bridge, but any other user port is.
> >
> > Tearing down the first user port causes phylink_destroy() to be called
> > on dp->pl, and this pointer to be set to NULL. Then, when the second
> > user port is torn down, this was offloading a VLAN-aware bridge port, so
> > indirectly it will trigger sja1105_static_config_reload().
> >
> > The latter function iterates using dsa_switch_for_each_available_port(),
> > and unconditionally dereferences dp->pl, including for the
> > aforementioned torn down previous port, and passes that to phylink.
> > This is where the NULL pointer is coming from.
> >
> > There are multiple levels at which this could be avoided:
> > - add an "if (dp->pl)" in sja1105_static_config_reload()
> > - make the phylink replay helpers NULL-tolerant
> > - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy()
> > has run, such that subsequent dsa_switch_for_each_available_port()
> > iterations skip them
> > - disconnect the entire switch at once from switchdev and
> > NETDEV_CHANGEUPPER events while unbinding, not just port by port,
> > likely using a "ds->unbinding = true" mechanism or similar
> >
> > however options 3 and 4 are quite heavy and might have side effects,
> > option 1 is very unassuming and option 2 seems a more elegant variant
> > of 1, given the fact that sja1105 is the only user of these phylink
> > replay helpers. It allows to keep the driver simple and is the option
> > I went with.
> >
> > Functionally speaking, transforming the replay helpers into no-ops for
> > ports without a phylink instance is fine, because that only happens
> > during driver removal (an operation which cannot be cancelled). The
> > ports are not required to work.
> >
> > Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks")
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
>
> I think this patch could land on current net, but it would be nice an
> ack from phylib SMEs.

Sorry, weekend away.

I prefer option 1. I _think_ option 2 only works because the MAC
driver set dp->pl to NULL. phylink is not responsible for the NULL, so
it seems odd for phylink to assume there is a NULL. Only the MAC
driver knows if the MAC driver has set dp->pl to NULL.

Andrew