Re: [PATCH v3 0/9] Phy, mdiobus, and netdev struct device fixes

From: Florian Fainelli
Date: Thu Sep 24 2015 - 21:40:15 EST


On 24/09/15 12:17, Russell King - ARM Linux wrote:
> Hi,
>
> The third version of this series fixes the build error which David
> identified, and drops the broken changes for the Cavium Thunger BGX
> ethernet driver as this driver requires some complex changes to
> resolve the leakage - and this is best done by people who can test
> the driver.
>
> Compared to v2, the only patch which has changed is patch 6
> "net: fix phy refcounting in a bunch of drivers"
>
> I _think_ I've been able to build-test all the drivers touched by
> that patch to some degree now, though several of them needed the
> Kconfig hacked to allow it (not all had || COMPILE_TEST clause on
> their dependencies.)

Tested-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>

Thanks for fixing that.

>
> Previous cover letters below:
>
> This is the second version of the series, with the comments David had
> on the first patch fixed up. Original series description with updated
> diffstat below.
>
> While looking at the DSA code, I noticed we have a
> of_find_net_device_by_node(), and it looks like users of that are
> similarly buggy - it looks like net/dsa/dsa.c is the only user. Fix
> that too.
>
> Hi,
>
> While looking at the phy code, I identified a number of weaknesses
> where refcounting on device structures was being leaked, where
> modules could be removed while in-use, and where the fixed-phy could
> end up having unintended consequences caused by incorrect calls to
> fixed_phy_update_state().
>
> This patch series resolves those issues, some of which were discovered
> with testing on an Armada 388 board. Not all patches are fully tested,
> particularly the one which touches several network drivers.
>
> When resolving the struct device refcounting problems, several different
> solutions were considered before settling on the implementation here -
> one of the considerations was to avoid touching many network drivers.
> The solution here is:
>
> phy_attach*() - takes a refcount
> phy_detach*() - drops the phy_attach refcount
>
> Provided drivers always attach and detach their phys, which they should
> already be doing, this should change nothing, even if they leak a refcount.
>
> of_phy_find_device() and of_* functions which use that take
> a refcount. Arrange for this refcount to be dropped once
> the phy is attached.
>
> This is the reason why the previous change is important - we can't drop
> this refcount taken by of_phy_find_device() until something else holds
> a reference on the device. This resolves the leaked refcount caused by
> using of_phy_connect() or of_phy_attach().
>
> Even without the above changes, these drivers are leaking by calling
> of_phy_find_device(). These drivers are addressed by adding the
> appropriate release of that refcount.
>
> The mdiobus code also suffered from the same kind of leak, but thankfully
> this only happened in one place - the mdio-mux code.
>
> I also found that the try_module_get() in the phy layer code was utterly
> useless: phydev->dev.driver was guaranteed to always be NULL, so
> try_module_get() was always being called with a NULL argument. I proved
> this with my SFP code, which declares its own MDIO bus - the module use
> count was never incremented irrespective of how I set the MDIO bus up.
> This allowed the MDIO bus code to be removed from the kernel while there
> were still PHYs attached to it.
>
> One other bug was discovered: while using in-band-status with mvneta, it
> was found that if a real phy is attached with in-band-status enabled,
> and another ethernet interface is using the fixed-phy infrastructure, the
> interface using the fixed-phy infrastructure is configured according to
> the other interface using the in-band-status - which is caused by the
> fixed-phy code not verifying that the phy_device passed in is actually
> a fixed-phy device, rather than a real MDIO phy.
>
> Lastly, having mdio_bus reversing phy_device_register() internals seems
> like a layering violation - it's trivial to move that code to the phy
> device layer.
>
> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 24 ++++++----
> drivers/net/ethernet/freescale/gianfar.c | 6 ++-
> drivers/net/ethernet/freescale/ucc_geth.c | 8 +++-
> drivers/net/ethernet/marvell/mvneta.c | 2 +
> drivers/net/ethernet/xilinx/xilinx_emaclite.c | 2 +
> drivers/net/phy/fixed_phy.c | 2 +-
> drivers/net/phy/mdio-mux.c | 19 +++++---
> drivers/net/phy/mdio_bus.c | 24 ++++++----
> drivers/net/phy/phy_device.c | 62 ++++++++++++++++++++------
> drivers/of/of_mdio.c | 27 +++++++++--
> include/linux/phy.h | 6 ++-
> net/core/net-sysfs.c | 9 ++++
> net/dsa/dsa.c | 41 ++++++++++++++---
> 13 files changed, 181 insertions(+), 51 deletions(-)
>


--
Florian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/