Re: [PATCH 3/3] net: stmmac: Add DWMAC glue layer for Renesas GBETH
From: Lad, Prabhakar
Date: Wed Mar 05 2025 - 16:28:30 EST
Hi Russell,
On Mon, Mar 3, 2025 at 4:32 PM Russell King (Oracle)
<linux@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, Mar 03, 2025 at 04:04:55PM +0000, Lad, Prabhakar wrote:
> > Hi Russell,
> >
> > On Mon, Mar 3, 2025 at 11:19 AM Russell King (Oracle)
> > <linux@xxxxxxxxxxxxxxx> wrote:
> > > I would like to get to the bottom of why this fails for module removal/
> > > insertion, but not for admistratively down/upping the interface.
> > >
> > > Removal of your module will unregister the netdev, and part of that
> > > work will bring the netdev administratively down. When re-inserting
> > > the module, that will trigger various userspace events, and it will
> > > be userspace bringing the network interface(s) back up. This should
> > > be no different from administratively down/upping the interface but
> > > it seems you get different behaviour.
> > >
> > > I'd like to understand why that is, because at the moment I'm wondering
> > > whether my patches that address the suspend/resume need further work
> > > before I send them - but in order to assess that, I need to work out
> > > why your issue only seems to occur in the module removal/insertion
> > > and not down/up as well as I'd expect.
> > >
> > > Please could you investigate this?
> > >
> > Sure I will look into this. Just wanted to check on your platform does
> > unload/load work OK? Also do you know any specific reason why DMA
> > reset could be failing so that I can look at it closer.
>
> It may be surprising, but I do not have stmmac hardware (although
> there is some I might be able to use, it's rather complicated so I
> haven't investigated that.) However, there's a lot of past history
> here, because stmmac has been painful for me as phylink maintainer.
> Consequently, I'm now taking a more active role in this driver,
> cleaning it up and fixing some of the stuff it's got wrong.
>
> That said, NVidia are in the process of arranging hardware for me.
>
> You are not the first to encounter reset failures, and this has always
> come down to clocks that aren't running.
>
> The DWMAC core is documented as requiring *all* clocks for each part of
> the core to be running in order for software reset to complete. If any
> clock is stopped, then reset will fail. That includes the clk_rx_i /
> clk_rx_180_i signals that come from the ethernet PHY's receive clock.
>
I did investigate on these lines:
1] With my current patch series I have the below in remove callback
+static void renesas_gbeth_remove(struct platform_device *pdev)
+{
+ struct renesas_gbeth *gbeth = get_stmmac_bsp_priv(&pdev->dev);
+
+ stmmac_dvr_remove(&pdev->dev);
+
+ clk_bulk_disable_unprepare(gbeth->num_clks, gbeth->clks);
+}
After dumping the CLK registers I found out that the Rx and Rx-180 CLK
never got turned OFF after unbind.
2] I replaced the remove callback with below ie first turn OFF
Tx-180/Rx/Rx-180 clocks
+static void renesas_gbeth_remove(struct platform_device *pdev)
+{
+ struct renesas_gbeth *gbeth = get_stmmac_bsp_priv(&pdev->dev);
+
+ clk_bulk_disable_unprepare(gbeth->num_clks, gbeth->clks);
+
+ stmmac_dvr_remove(&pdev->dev);
+}
After dumping the CLK registers I confirmed all the clocks were OFF
(CSR/PCLK/Tx/Tx-180/Rx/Rx-180) after unbind operation. Now when I do a
bind operation Rx clock fails to enable, which is probably because the
PHY is not providing any clock.
> However, PHYs that have negotiated EEE are permitted to stop their
> receive clock, which can be enabled by an appropriate control bit.
> phy_eee_rx_clock_stop() manipulates that bit. stmmac has in most
> cases permitted the PHY to stop its receive clock.
>
You mean phy_eee_rx_clock_stop() is the one which tells the PHY to
disable the Rx clocks? Actually I tried the below hunk with this as
well the Rx clock fails to be turned ON after unbind/bind operation.
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 0ba434104f5b..e16f4a6f5715 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1756,6 +1756,7 @@ EXPORT_SYMBOL_GPL(phy_eee_tx_clock_stop_capable);
*/
int phy_eee_rx_clock_stop(struct phy_device *phydev, bool clk_stop_enable)
{
+ return 0;
int ret;
/* Configure the PHY to stop receiving xMII
> NVidia have been a recent victim of this - it is desirable to allow
> receive clock stop, but there hasn't been the APIs in the kernel
> to allow MAC drivers to re-enable the clock when they need it.
>
> Up until now, I had thought this was just a suspend/resume issue
> (which is NVidia's reported case). Your testing suggests that it is
> more widespread than that.
>
> While I've been waiting to hear from you, I've prepared some patches
> that change the solution that I proposed for NVidia (currently on top
> of that patch set).
>
I tried your latest patches [0], this didnt resolve the issue.
(Actually there was a build problem with the patch I fixed it below
hunk)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9a62808cf935..21cdea4aec9a 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2609,7 +2609,7 @@ void phylink_rx_clk_stop_block(struct phylink *pl)
* function has been called and clock-stop was previously enabled.
*/
if (pl->mac_rx_clk_stop_blocked++ == 0 &&
- pl->mac_supports_eee_ops && pl->phydev)
+ pl->mac_supports_eee_ops && pl->phydev &&
pl->config->eee_rx_clk_stop_enable)
phy_eee_rx_clock_stop(pl->phydev, false);
}
[0] https://lore.kernel.org/all/Z8bbnSG67rqTj0pH@xxxxxxxxxxxxxxxxxxxxx/
> However, before I proceed with them, I need you to get to the bottom
> of why:
>
> # ip li set dev $if down
> # ip li set dev $if up
>
> doesn't trigger it, but removing and re-inserting the module does.
>
Doing the above does not turn OFF/ON all the clocks. Looking at the
dump from the CLK driver on my platform only stmmaceth and pclk are
the clocks which get toggled and rest remain ON. Note Im not sure if
the PHY is disabling the Rx clocks when I run ip li set dev $if down I
cannot scope that pin on the board.
Please let me know if you have any pointers for me to look further
into this issue.
Cheers,
Prabhakar