Re: [PATCH] r8169: Reinstate ALDPS and ASPM support

From: Heiner Kallweit
Date: Tue Jun 05 2018 - 15:24:43 EST


On 05.06.2018 19:28, Bjorn Helgaas wrote:
> On Tue, Jun 05, 2018 at 06:34:09AM +0000, Ryankao wrote:
>> Add realtek folk Hau
>>
>> -----Original Message-----
>> From: Kai Heng Feng [mailto:kai.heng.feng@xxxxxxxxxxxxx]
>> Sent: Tuesday, June 05, 2018 1:02 PM
>> To: jrg.otte@xxxxxxxxx
>> Cc: David Miller <davem@xxxxxxxxxxxxx>; Hayes Wang <hayeswang@xxxxxxxxxxx>; hkallweit1@xxxxxxxxx; romieu@xxxxxxxxxxxxx; Linux Netdev List <netdev@xxxxxxxxxxxxxxx>; Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>; Ryankao <ryankao@xxxxxxxxxxx>
>> Subject: Re: [PATCH] r8169: Reinstate ALDPS and ASPM support
>>
>> Hi JÃrg Otte,
>>
>> Can you give this patch a try?
>>
>> Since you are the only one that reported ALDPS/ASPM regression,
>>
>> And I think this patch should solve the issue you had [1].
>>
>> Hopefully we don't need to go down the rabbit hole of blacklist/whitelist...
>>
>> Kai-Heng
>>
>> [1] https://lkml.org/lkml/2013/1/5/36
>
> I have no idea what ALDPS is. It's not mentioned in the PCIe spec, so
> presumably it's some Realtek-specific thing. ASPM is a generic PCIe
> thing. Changes to these two things should be in separate patches so
> they don't get tangled up.
>
>>> On Jun 5, 2018, at 12:58 PM, Kai-Heng Feng
>>> <kai.heng.feng@xxxxxxxxxxxxx>
>>> wrote:
>>>
>>> This patch reinstate ALDPS and ASPM support on r8169.
>>>
>>> On some Intel platforms, ASPM support on r8169 is the key factor to
>>> let Package C-State achieve PC8. Without ASPM support, the deepest
>>> Package C-State can hit is PC3. PC8 can save additional ~3W in
>>> comparison with PC3.
>>>
>>> This patch is from Realtek.
>>>
>>> Fixes: e0c075577965 ("r8169: enable ALDPS for power saving")
>>> Fixes: d64ec841517a ("r8169: enable internal ASPM and clock request
>>> settings")
>
>>> +3507,15 @@ static void rtl8168e_1_hw_phy_config(struct
>>> rtl8169_private *tp)
>>> rtl_writephy(tp, 0x0d, 0x4007);
>>> rtl_writephy(tp, 0x0e, 0x0000);
>>> rtl_writephy(tp, 0x0d, 0x0000);
>>> +
>>> + /* Check ALDPS bit, disable it if enabled */
>>> + rtl_writephy(tp, 0x1f, 0x0000);
>>> + if (enable_aldps)
>>> + rtl_w0w1_phy(tp, 0x15, 0x1000, 0x0000);
>>> + else if (rtl_readphy(tp, 0x15) & 0x1000)
>>> + rtl_w0w1_phy(tp, 0x15, 0x0000, 0x1000);
>
> There's a lot of repetition of this code with minor variations. You
> could probably factor it out and make it more concise and more
> readable.
>
>>> +static void rtl8169_check_link_status(struct net_device *dev,
>>> + struct rtl8169_private *tp) {
>>> + struct device *d = tp_to_dev(tp);
>>> +
>>> + if (tp->link_ok(tp)) {
>>> + rtl_link_chg_patch(tp);
>>> + /* This is to cancel a scheduled suspend if there's one. */
>>> + if (pm_request_resume(d))
>>> + _rtl_reset_work(tp);
>>> + netif_carrier_on(dev);
>>> + if (net_ratelimit())
>>> + netif_info(tp, ifup, dev, "link up\n");
>>> + } else {
>>> + netif_carrier_off(dev);
>>> + netif_info(tp, ifdown, dev, "link down\n");
>>> + pm_runtime_idle(d);
>>> + }
>>> +}
>
> This function apparently just got moved around without changing
> anything. That's fine, but the move should be in a separate patch to
> make the real changes easier to review.
>
>>> @@ -7649,8 +7757,12 @@ static int rtl_init_one(struct pci_dev *pdev,
>>> const struct pci_device_id *ent)
>>>
>>> /* disable ASPM completely as that cause random device stop working
>>> * problems as well as full system hangs for some PCIe devices users */
>>> - pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1 |
>>> - PCIE_LINK_STATE_CLKPM);
>>> + if (!enable_aspm) {
>>> + pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
>>> + PCIE_LINK_STATE_L1 |
>>> + PCIE_LINK_STATE_CLKPM);
>>> + netif_info(tp, probe, dev, "ASPM disabled\n");
>>> + }
>
> ASPM is a generic PCIe feature that should be configured by the PCI
> core without any help from the device driver.
>
> If code in the driver is needed, that means either the PCI core is
> doing it wrong and we should fix it there, or the device is broken and
> the driver is working around the erratum.
>
> If this is an erratum, you should include details about exactly what's
> broken and (ideally) a URL to the published erratum. Otherwise this
> is just unmaintainable black magic and likely to be broken by future
> ASPM changes in the PCI core.
>
Fully agree, but: There are no publicly available datasheets and only
source for such magic is the r8168 vendor driver and trial&error.

In addition the driver supports ~ 50 chip variants and not all variants
(unfortunately nobody except Realtek knows which) are affected by the
problem. Maybe the involved Realtek guys can shed some light on this.

> ASPM configuration is done by the PCI core before drivers are bound to
> the device. If you need device-specific workarounds, they should
> probably be in quirks so they're done before the core does that ASPM
> configuration.
>
>>> /* enable device (incl. PCI PM wakeup and hotplug setup) */
>>> rc = pcim_enable_device(pdev);
>>> --
>>> 2.17.0
>>
>> ------Please consider the environment before printing this e-mail.
>