Re: [PATCH] e1000e: Power cycle phy on PM resume
From: Kai-Heng Feng
Date: Thu Sep 24 2020 - 08:51:14 EST
Hi Andrew,
> On Sep 23, 2020, at 23:37, Andrew Lunn <andrew@xxxxxxx> wrote:
>
> On Wed, Sep 23, 2020 at 10:44:10PM +0800, Kai-Heng Feng wrote:
>> Hi Andrew,
>>
>>> On Sep 23, 2020, at 20:17, Andrew Lunn <andrew@xxxxxxx> wrote:
>>>
>>> On Wed, Sep 23, 2020 at 03:47:51PM +0800, Kai-Heng Feng wrote:
>>>> We are seeing the following error after S3 resume:
>>>> [ 704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>>>> [ 704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete
>>>> [ 704.902817] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>>>> [ 704.903075] e1000e 0000:00:1f.6 eno1: reading PHY page 769 (or 0x6020 shifted) reg 0x17
>>>> [ 704.903281] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>>>> [ 704.903486] e1000e 0000:00:1f.6 eno1: writing PHY page 769 (or 0x6020 shifted) reg 0x17
>>>> [ 704.943155] e1000e 0000:00:1f.6 eno1: MDI Error
>>>> ...
>>>> [ 705.108161] e1000e 0000:00:1f.6 eno1: Hardware Error
>>>>
>>>> Since we don't know what platform firmware may do to the phy, so let's
>>>> power cycle the phy upon system resume to resolve the issue.
>>>>
>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
>>>> ---
>>>> drivers/net/ethernet/intel/e1000e/netdev.c | 2 ++
>>>> 1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> index 664e8ccc88d2..c2a87a408102 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>>>> @@ -6968,6 +6968,8 @@ static __maybe_unused int e1000e_pm_resume(struct device *dev)
>>>> !e1000e_check_me(hw->adapter->pdev->device))
>>>> e1000e_s0ix_exit_flow(adapter);
>>>>
>>>> + e1000_power_down_phy(adapter);
>>>> +
>>>
>>> static void e1000_power_down_phy(struct e1000_adapter *adapter)
>>> {
>>> struct e1000_hw *hw = &adapter->hw;
>>>
>>> /* Power down the PHY so no link is implied when interface is down *
>>> * The PHY cannot be powered down if any of the following is true *
>>> * (a) WoL is enabled
>>> * (b) AMT is active
>>> * (c) SoL/IDER session is active
>>> */
>>> if (!adapter->wol && hw->mac_type >= e1000_82540 &&
>>> hw->media_type == e1000_media_type_copper) {
>>
>> Looks like the the function comes from e1000, drivers/net/ethernet/intel/e1000/e1000_main.c.
>> However, this patch is for e1000e, so the function with same name is different.
>
> Ah! Sorry. Missed that. Also it is not nice there are two functions in
> the kernel with the same name.
>
>>> Could it be coming out of S3 because it just received a WoL?
>>
>> No, the issue can be reproduced by pressing keyboard or rtcwake.
>
> Not relevant now, since i was looking at the wrong function. But i was
> meaning the call is a NOP in the case WoL caused the wake up. So if
> the issues can also happen after WoL, your fix is not going to fix it.
>
>>> It seems unlikely that it is the MII_CR_POWER_DOWN which is helping,
>>> since that is an MDIO write itself. Do you actually know how this call
>>> to e1000_power_down_phy() fixes the issues?
>>
>
>> I don't know from hardware's perspective, but I think the comment on
>> e1000_power_down_phy_copper() can give us some insight:
>
> And there is only one function called e1000_power_down_phy_copper()
> :-)
>
>>
>> /**
>> * e1000_power_down_phy_copper - Restore copper link in case of PHY power down
>> * @hw: pointer to the HW structure
>> *
>> * In the case of a PHY power down to save power, or to turn off link during a
>> * driver unload, or wake on lan is not enabled, restore the link to previous
>> * settings.
>> **/
>> void e1000_power_down_phy_copper(struct e1000_hw *hw)
>> {
>> u16 mii_reg = 0;
>>
>> /* The PHY will retain its settings across a power down/up cycle */
>> e1e_rphy(hw, MII_BMCR, &mii_reg);
>> mii_reg |= BMCR_PDOWN;
>> e1e_wphy(hw, MII_BMCR, mii_reg);
>> usleep_range(1000, 2000);
>> }
>
> I don't really see how this explains this:
>
>>>> [ 704.746874] e1000e 0000:00:1f.6 eno1: Setting page 0x6020
>>>> [ 704.844232] e1000e 0000:00:1f.6 eno1: MDI Write did not complete
>
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/e1000e/phy.c#L181
>
> So first off, the comments are all cut/paste from
> e1000e_read_phy_reg_mdic(). It would be nice to s/read/write/g in that
> function.
Ah yes...
>
> So it sets up the transaction and starts it. MDIO is a serial bus with
> no acknowledgements. You clock out around 64 bits, and hope the PHY
> receives it. The time it takes to send those 64 bits is fixed by the
> bus speed, typically 2.5MHz.
Thanks for the info.
>
> So the driver polls waiting for the hardware to say the bits have been
> sent. And this is timing out. How long that takes has nothing to do
> with the PHY, or what state it is in. Powering down the PHY has no
> effect on the MDIO bus master, and how long it takes to shift those
> bits out. Which is why i don't think this patch is correct. This is
> probably an MDIO bus issue, not a PHY issue.
Thanks for pointing out the possible root cause.
Indeed this looks like an MDIO issue so this patch is completely wrong.
I'll send a V2, thanks.
Kai-Heng
>
> Try dumping the value of MDIC in the good/bad case before the
> transaction starts.
>
> Andrew