Re: [PATCH 0/3] Fix for KSZ DSA switch shutdown

From: Lino Sanfilippo
Date: Thu Sep 09 2021 - 12:38:18 EST


On 09.09.21 at 17:47, Vladimir Oltean wrote:
> On Thu, Sep 09, 2021 at 03:19:52PM +0200, Lino Sanfilippo wrote:
>>> Do you see similar things on your 5.10 kernel?
>>
>> For the master device is see
>>
>> lrwxrwxrwx 1 root root 0 Sep 9 14:10 /sys/class/net/eth0/device/consumer:spi:spi3.0 -> ../../../virtual/devlink/platform:fd580000.ethernet--spi:spi3.0
>
> So this is the worst of the worst, we have a device link but it doesn't help.
>
> Where the device link helps is here:
>
> __device_release_driver
> while (device_links_busy(dev))
> device_links_unbind_consumers(dev);
>
> but during dev_shutdown, device_links_unbind_consumers does not get called
> (actually I am not even sure whether it should).
>
> I've reproduced your issue by making this very simple change:
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index 60d94e0a07d6..ec00f34cac47 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -1372,6 +1372,7 @@ static struct pci_driver enetc_pf_driver = {
> .id_table = enetc_pf_id_table,
> .probe = enetc_pf_probe,
> .remove = enetc_pf_remove,
> + .shutdown = enetc_pf_remove,
> #ifdef CONFIG_PCI_IOV
> .sriov_configure = enetc_sriov_configure,
> #endif
>
> on my DSA master driver. This is what the genet driver has "special".
>

Ah, that is interesting.

> I was led into grave error by Documentation/driver-api/device_link.rst,
> which I've based my patch on, where it clearly says that device links
> are supposed to help with shutdown ordering (how?!).
>
> So the question is, why did my DSA trees get torn down on shutdown?
> Basically the short answer is that my SPI controller driver does
> implement .shutdown, and calls the same code path as the .remove code,
> which calls spi_unregister_controller which removes all SPI children..
>
> When I added this device link, one of the main objectives was to not
> modify all DSA drivers. I was certain based on the documentation that
> device links would help, now I'm not so sure anymore.
>
> So what happens is that the DSA master attempts to unregister its net
> device on .shutdown, but DSA does not implement .shutdown, so it just
> sits there holding a reference (supposedly via dev_hold, but where from?!)
> to the master, which makes netdev_wait_allrefs to wait and wait.
>

Right, that was also my conclusion.

> I need more time for the denial phase to pass, and to understand what
> can actually be done. I will also be away from the keyboard for the next
> few days, so it might take a while. Your patches obviously offer a
> solution only for KSZ switches, we need something more general. If I
> understand your solution, it works not by virtue of there being any
> shutdown ordering guarantee at all, but simply due to the fact that
> DSA's .shutdown hook gets called eventually, and the reference to the
> master gets freed eventually, which unblocks the unregister_netdevice
> call from the master.

Well actually the SPI shutdown hook gets called which then calls ksz9477_shutdown
(formerly ksz9477_reset_switch) which then shuts down the switch by
stopping the worker thread and tearing down the DSA tree (via dsa_tree_shutdown()).

While it is right that the patch series only fixes the KSZ case for now, the idea was that
other drivers could use a similar approach in by calling the new function dsa_tree_shutdown()
in their shutdown handler to make sure that all refs to the master device are released.


Regards,
Lino