Re: [PATCH] r8169: don't use MSI-X on RTL8106e

From: Heiner Kallweit
Date: Tue Aug 21 2018 - 16:54:29 EST


On 21.08.2018 10:28, Marc Zyngier wrote:
> On 20/08/18 19:44, Bjorn Helgaas wrote:
>> [+cc Marc, Thomas, Christoph, linux-pci)
>> (beginning of thread at [1])
>>
>> On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote:
>>> On 16.08.2018 21:39, David Miller wrote:
>>>> From: Heiner Kallweit <hkallweit1@xxxxxxxxx>
>>>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>>>
>>>>> On 16.08.2018 21:21, David Miller wrote:
>>>>>> From: <jian-hong@xxxxxxxxxxxx>
>>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>>>
>>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>>>> from suspend when using MSI-X. The chip is RTL8106e - version 39.
>>>>>>
>>>>>> Heiner, please take a look at this.
>>>>>>
>>>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>>>
>>>>>> Now that we've seen two chips like this, maybe there is some other
>>>>>> problem afoot.
>>>>>>
>>>>> Thanks for the hint. I saw it already and just contacted Realtek
>>>>> whether they are aware of any MSI-X issues with particular chip
>>>>> versions. With the chip versions I have access to MSI-X works fine.
>>>>>
>>>>> There's also the theoretical option that the issues are caused by
>>>>> broken BIOS's. But so far only chip versions have been reported
>>>>> which are very similar, at least with regard to version number
>>>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>>>
>>>>> Let's see whether Realtek can provide some hint.
>>>>> If more chip versions are reported having problems with MSI-X,
>>>>> then we could switch to a whitelist or disable MSI-X in general.
>>>>
>>>> It could be that we need to reprogram some register(s) on resume,
>>>> which normally might not be needed, and that is what is causing the
>>>> problem with some chips.
>>>>
>>> Indeed. That's what I'm checking with Realtek.
>>> In the register list in the r8169 driver there's one entry which
>>> seems to indicate that there are MSI-X specific settings.
>>> However this register isn't used, and the r8168 vendor driver
>>> uses only MSI. And there are no public datasheets.
>>
>> Do we have any information about these chip versions in other systems?
>> Or other devices using MSI-X in the same ASUS system? It seems
>> possible that there's some PCI core or suspend/resume issue with MSI-X
>> and this patch just avoids it without fixing the root cause.
>>
>> It might be useful to have a kernel.org bugzilla with the complete
>> dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived
>> for future reference.
>
> The one system I have with a Realtek chip seems happy enough with MSI-X,
> but it never gets suspended.

Other owners of affected chip versiosn made the same experience, MSI-X
works fine until resume from suspend.

> There is comment in the patch that I don't quite get:
>
>> It is the IRQ 127 - PCI-MSI used by enp2s0. However, lspci lists MSI is
>> disabled and MSI-X is enabled which conflicts to the interrupt table.
>
> What do you mean by "conflicts"? With what? Another question is whether
> you've loaded any firmware (some versions of the Realtek HW seem to require
> it).
>
These "conflicts" were a misunderstanding which was clarified with the
reporter. "PCI-MSI" as irq chip name in /proc/interrupts output was
interpreted in a way that a MSI irq is used, not a MSI-X irq.

The firmware is for the PHY only, that's at least my experience on
the chip versions I have for testing.

> For the posterity, some data from my own system, which I don't know if it
> has any relevance to the problem at hand.
>
> Thanks,
>
> M.
>
> [ 2.624963] r8169 0000:02:00.0 eth0: RTL8168g/8111g, 5a:fe:ad:ce:11:00, XID 4c000800, IRQ 26
> [ 2.633398] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
>
> 26: 50 997005 0 0 MSI 1048576 Edge enp2s0
>
> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 25
> Region 0: I/O ports at 1000 [size=256]
> Region 2: Memory at 100004000 (64-bit, prefetchable) [size=4K]
> Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
> Capabilities: [70] Express (v2) Endpoint, MSI 01
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> Vector table: BAR=4 offset=00000000
> PBA: BAR=4 offset=00000800
> Capabilities: [d0] Vital Product Data
> pcilib: sysfs_read_vpd: read failed: Input/output error
> Not readable
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> Capabilities: [140 v1] Virtual Channel
> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb: Fixed- WRR32- WRR64- WRR128-
> Ctrl: ArbSelect=Fixed
> Status: InProgress-
> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> Status: NegoPending- InProgress-
> Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
> Capabilities: [170 v1] Latency Tolerance Reporting
> Max snoop latency: 0ns
> Max no snoop latency: 0ns
> Kernel driver in use: r8169
>
>