PROBLEM: Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)
From: Foster Snowhill
Date: Sat Jul 23 2016 - 11:29:55 EST
[1.] One line summary of the problem:
Intel I210AT NIC resets while using PCI passthrough on ESXi (regression)
[2.] Full description of the problem/report:
Intel I210AT NIC running on the latest (4.7-rc7) kernel revision suffers a "transmit queue 0 timed out" issue, which makes the NIC unusable. Each time this happens, the adapter is reset. To trigger the bug, bring up the adapter and attempt to transmit something. In my case it was enough to configure IPv4 on it and ping something. The more data is attempted to be transmitted, the more often the abovementioned error occurs.
The system runs on a VMware ESXi 6.0 hypervisor and the NIC is attached to the Linux VM via PCI passthrough. BIOS and ESXi are all latest stable versions as of June 23rd, 2016.
A bisect was performed, which pointed to commit 52f518a3a7c2f80551a38d38be28bc9f335e713c (x86/MSI: Use hierarchical irqdomains to manage MSI interrupts) being the first bad one. Full bisect log is below in part [9.].
It was noted that sometimes the bug does not reproduce the first time any given kernel version is booted, however subsequent boots reproduce the issue.
The following appears in the kernel log when the bug occurs (also appended relevant information, MAC addresses edited out).
[ 2.099565] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.3.0-k
[ 2.100488] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 2.106832] igb 0000:1b:00.0: enabling device (0000 -> 0002)
[ 2.146246] igb 0000:1b:00.0: added PHC on eth1
[ 2.147211] igb 0000:1b:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 2.148468] igb 0000:1b:00.0: eth1: (PCIe:5.0Gb/s:unknown) xx:xx:xx:xx:xx:xx
[ 2.149846] igb 0000:1b:00.0: eth1: PBA No: 001300-000
[ 2.150900] igb 0000:1b:00.0: Using MSI-X interrupts. 1 rx queue(s), 1 tx queue(s)
[ 2.502960] igb 0000:1b:00.0 eth-wan2: renamed from eth1
[ 15.836869] igb 0000:1b:00.0 eth-wan2: igb: eth-wan2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 20.848027] ------------[ cut here ]------------
[ 20.848033] WARNING: CPU: 0 PID: 0 at /home/kernel/COD/linux/net/sched/sch_generic.c:272 dev_watchdog+0x22d/0x230
[ 20.848034] NETDEV WATCHDOG: eth-wan2 (igb): transmit queue 0 timed out
[ 20.848035] Modules linked in: vmw_vsock_vmci_transport vsock fuse intel_powerclamp coretemp intel_rapl_perf ppdev vmw_balloon joydev nls_utf8 nls_cp437 vfat fat efi_pstore efivars serio_raw sg battery parport_pc parport shpchp i2c_piix4 vmw_vmci acpi_cpufreq ac tpm_tis tpm evdev sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi efivarfs autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod hid_generic usbhid hid sr_mod cdrom crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng sd_mod hmac drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse
[ 20.848061] igb dca ptp pps_core i2c_algo_bit ata_generic vmxnet3 uhci_hcd ehci_pci ehci_hcd vmwgfx vmw_pvscsi ata_piix ahci libahci usbcore usb_common ttm libata drm_kms_helper scsi_mod drm fjes button
[ 20.848069] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-040700rc7-generic #201607110032
[ 20.848070] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.0.B64.1506250318 06/25/2015
[ 20.848071] 0000000000000286 223eaef37d48856a ffffffff81321265 ffff88003c603e10
[ 20.848072] 0000000000000000 ffffffff81078a3e 0000000000000000 ffff88003c603e68
[ 20.848073] ffff88003434c000 0000000000000000 ffff880034cf1940 0000000000000008
[ 20.848075] Call Trace:
[ 20.848076] <IRQ> [<ffffffff81321265>] ? dump_stack+0x5c/0x77
[ 20.848082] [<ffffffff81078a3e>] ? __warn+0xbe/0xe0
[ 20.848084] [<ffffffff81078abf>] ? warn_slowpath_fmt+0x5f/0x80
[ 20.848086] [<ffffffff8150cc9d>] ? dev_watchdog+0x22d/0x230
[ 20.848087] [<ffffffff8150ca70>] ? qdisc_rcu_free+0x40/0x40
[ 20.848089] [<ffffffff810e0680>] ? call_timer_fn+0x30/0x120
[ 20.848090] [<ffffffff8150ca70>] ? qdisc_rcu_free+0x40/0x40
[ 20.848091] [<ffffffff810e0ffc>] ? run_timer_softirq+0x23c/0x2f0
[ 20.848094] [<ffffffff815ee5e8>] ? __do_softirq+0xf8/0x28e
[ 20.848095] [<ffffffff8107e32b>] ? irq_exit+0x9b/0xa0
[ 20.848096] [<ffffffff815ee3fe>] ? smp_apic_timer_interrupt+0x3e/0x50
[ 20.848097] [<ffffffff815ec722>] ? apic_timer_interrupt+0x82/0x90
[ 20.848098] <EOI> [<ffffffff8105bd72>] ? native_safe_halt+0x2/0x10
[ 20.848102] [<ffffffff810354d8>] ? default_idle+0x18/0xc0
[ 20.848103] [<ffffffff810b9429>] ? cpu_startup_entry+0x2d9/0x330
[ 20.848106] [<ffffffff81b2df73>] ? start_kernel+0x44f/0x46f
[ 20.848108] [<ffffffff81b2d120>] ? early_idt_handler_array+0x120/0x120
[ 20.848109] [<ffffffff81b2d41c>] ? x86_64_start_kernel+0x14f/0x173
[ 20.848110] ---[ end trace 41fb9507befdccac ]---
[ 20.848125] igb 0000:1b:00.0 eth-wan2: Reset adapter
[ 24.836669] igb 0000:1b:00.0 eth-wan2: igb: eth-wan2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Linux version 4.7.0-040700rc7-generic (kernel@gloin) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1) ) #201607110032 SMP Mon Jul 11 04:34:25 UTC 2016
[5.] Most recent kernel that did not have the bug:
Version 4.1 in mainline was the most recent kernel version without the bug.
[8.] Environment
[8.5.] PCI information ('lspci -vvv' as root)
1b:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
DeviceName: pciPassthru1
Subsystem: ASUSTeK Computer Inc. I210 Gigabit Network Connection
Physical Slot: 256
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at fd100000 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at 3000 [disabled] [size=32]
Region 3: Memory at fd180000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x32, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x32, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Device Serial Number 60-a4-4c-ff-ff-e9-0b-a9
Capabilities: [1a0 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Kernel driver in use: igb
Kernel modules: igb
[8.7.] /proc/interrupts
CPU0
0: 19 IO-APIC 2-edge timer
1: 9 IO-APIC 1-edge i8042
4: 47798 IO-APIC 4-edge serial
8: 1 IO-APIC 8-edge rtc0
9: 0 IO-APIC 9-fasteoi acpi
12: 15 IO-APIC 12-edge i8042
14: 0 IO-APIC 14-edge ata_piix
15: 0 IO-APIC 15-edge ata_piix
16: 65 IO-APIC 16-fasteoi vmwgfx, uhci_hcd:usb2
17: 0 IO-APIC 17-fasteoi ehci_hcd:usb1
24: 0 PCI-MSI 344064-edge PCIe PME, pciehp
25: 0 PCI-MSI 346112-edge PCIe PME, pciehp
26: 0 PCI-MSI 348160-edge PCIe PME, pciehp
27: 0 PCI-MSI 350208-edge PCIe PME, pciehp
28: 0 PCI-MSI 352256-edge PCIe PME, pciehp
29: 0 PCI-MSI 354304-edge PCIe PME, pciehp
30: 0 PCI-MSI 356352-edge PCIe PME, pciehp
31: 0 PCI-MSI 358400-edge PCIe PME, pciehp
32: 0 PCI-MSI 360448-edge PCIe PME, pciehp
33: 0 PCI-MSI 362496-edge PCIe PME, pciehp
34: 0 PCI-MSI 364544-edge PCIe PME, pciehp
35: 0 PCI-MSI 366592-edge PCIe PME, pciehp
36: 0 PCI-MSI 368640-edge PCIe PME, pciehp
37: 0 PCI-MSI 370688-edge PCIe PME, pciehp
38: 0 PCI-MSI 372736-edge PCIe PME, pciehp
39: 0 PCI-MSI 374784-edge PCIe PME, pciehp
40: 0 PCI-MSI 376832-edge PCIe PME, pciehp
41: 0 PCI-MSI 378880-edge PCIe PME, pciehp
42: 0 PCI-MSI 380928-edge PCIe PME, pciehp
43: 0 PCI-MSI 382976-edge PCIe PME, pciehp
44: 0 PCI-MSI 385024-edge PCIe PME, pciehp
45: 0 PCI-MSI 387072-edge PCIe PME, pciehp
46: 0 PCI-MSI 389120-edge PCIe PME, pciehp
47: 0 PCI-MSI 391168-edge PCIe PME, pciehp
48: 0 PCI-MSI 393216-edge PCIe PME, pciehp
49: 0 PCI-MSI 395264-edge PCIe PME, pciehp
50: 0 PCI-MSI 397312-edge PCIe PME, pciehp
51: 0 PCI-MSI 399360-edge PCIe PME, pciehp
52: 0 PCI-MSI 401408-edge PCIe PME, pciehp
53: 0 PCI-MSI 403456-edge PCIe PME, pciehp
54: 0 PCI-MSI 405504-edge PCIe PME, pciehp
55: 0 PCI-MSI 407552-edge PCIe PME, pciehp
56: 584 PCI-MSI 1064960-edge ahci[0000:02:01.0]
57: 4293 PCI-MSI 1572864-edge vmw_pvscsi
60: 0 PCI-MSI 14155776-edge eth-wan2
61: 0 PCI-MSI 14155777-edge eth-wan2-rx-0
62: 0 PCI-MSI 14155778-edge eth-wan2-tx-0
63: 154 PCI-MSI 129024-edge vmw_vmci
64: 0 PCI-MSI 129025-edge vmw_vmci
NMI: 0 Non-maskable interrupts
LOC: 52617 Local timer interrupts
SPU: 0 Spurious interrupts
PMI: 0 Performance monitoring interrupts
IWI: 0 IRQ work interrupts
RTR: 0 APIC ICR read retries
RES: 0 Rescheduling interrupts
CAL: 0 Function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
THR: 0 Threshold APIC interrupts
DFR: 0 Deferred Error APIC interrupts
MCE: 0 Machine check exceptions
MCP: 4 Machine check polls
ERR: 0
MIS: 0
PIN: 0 Posted-interrupt notification event
PIW: 0 Posted-interrupt wakeup event
[9.] Full git bisect log
git bisect start
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# bad: [d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754] Linux 4.2-rc1
git bisect bad d770e558e21961ad6cfdf0ff7df0eb5d7d4f0754
# bad: [4570a37169d4b44d316f40b2ccc681dc93fedc7b] Merge tag 'sound-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 4570a37169d4b44d316f40b2ccc681dc93fedc7b
# bad: [4e241557fc1cb560bd9e77ca1b4a9352732a5427] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect bad 4e241557fc1cb560bd9e77ca1b4a9352732a5427
# bad: [44d21c3f3a2ef2f58b18bda64c52c99e723f3f4a] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad 44d21c3f3a2ef2f58b18bda64c52c99e723f3f4a
# good: [e75c73ad64478c12b3a44b86a3e7f62a4f65b93e] Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e75c73ad64478c12b3a44b86a3e7f62a4f65b93e
# bad: [43224b96af3154cedd7220f7b90094905f07ac78] Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 43224b96af3154cedd7220f7b90094905f07ac78
# bad: [9dda1658a9bd450d65da5153a2427955785d17c2] Merge branch 'x86/asm' into x86/core, to prepare for new patch
git bisect bad 9dda1658a9bd450d65da5153a2427955785d17c2
# bad: [191a66353b22fad8ac89404ab4c929cbe7b0afb2] Merge branch 'x86/asm' into x86/apic, to resolve a conflict
git bisect bad 191a66353b22fad8ac89404ab4c929cbe7b0afb2
# bad: [099c5c03487f6bca30c628e14e666788dd61fb33] irq_remapping/vt-d: Move struct irq_2_iommu into intel_irq_remapping.c
git bisect bad 099c5c03487f6bca30c628e14e666788dd61fb33
# bad: [90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30] x86/MSI: Replace msi_update_msg() with irq_chip_compose_msi_msg()
git bisect bad 90d84fe95dd6b418383aa0e0e5cace8f1b1e7e30
# good: [3cb96f0c97330834929abe9bd2ca3c252a83def0] x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains
git bisect good 3cb96f0c97330834929abe9bd2ca3c252a83def0
# bad: [b1855c752e67d1125d41fadb499014b49a245db8] x86/MSI: Clean up unused MSI related code and interfaces
git bisect bad b1855c752e67d1125d41fadb499014b49a245db8
# bad: [2b43817e481da9f5118adb56aef46b3f0298c685] irq_remapping/vt-d: Clean up unused MSI related code
git bisect bad 2b43817e481da9f5118adb56aef46b3f0298c685
# bad: [80aa283364a17998dceb577bd185e3380b927544] x86/irq: Directly call native_compose_msi_msg() for DMAR IRQ
git bisect bad 80aa283364a17998dceb577bd185e3380b927544
# bad: [52f518a3a7c2f80551a38d38be28bc9f335e713c] x86/MSI: Use hierarchical irqdomains to manage MSI interrupts
git bisect bad 52f518a3a7c2f80551a38d38be28bc9f335e713c
# first bad commit: [52f518a3a7c2f80551a38d38be28bc9f335e713c] x86/MSI: Use hierarchical irqdomains to manage MSI interrupts