Re: [BUG] 4.11.0-rc1 panic on shutdown X61s
From: Borislav Petkov
Date: Mon Mar 13 2017 - 21:44:32 EST
On Tue, Mar 14, 2017 at 01:20:27AM +0000, Brown, Aaron F wrote:
> Believe it or not we actually do test these changes. This one was
> tested by me and I did not have the same results you and the other
> people reporting this trace did. I made it back in the lab today and
> have spent a good part of the day attempting to reproduce this bug
> without success. Freeze / resume works for me on all the systems I
> have tried, which includes a sampling of all the current parts and
> many older ones.
Yeah, tell me about it.
> Given there are several other reports of this it is obviously an issue
> and I would like to be able to reproduce it in case another patch to
> resolve the issue this attempts to fix comes back in another form. So
> I want to know what's different between the systems that hit this and
> my bank of systems that don't.
So mine is not the newest anymore: thinkpad x230.
> What exact part (or parts) are we looking at (lspci|grep -i eth)
Lemme give you the gory details (PCI cfg space etc):
$ lspci -xxx -vvvv | grep -i eth -A 36
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
Subsystem: Lenovo 82579LM Gigabit Network Connection
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 30
Region 0: Memory at f1500000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at f153b000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at 4080 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee002d8 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e
Kernel modules: e1000e
00: 86 80 02 15 07 04 10 00 04 00 00 02 00 00 00 00
10: 00 00 50 f1 00 b0 53 f1 81 40 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 f3 21
30: 00 00 00 00 c8 00 00 00 00 00 00 00 07 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 01 d0 22 c8 00 20 00 07
d0: 05 e0 81 00 d8 02 e0 fe 00 00 00 00 00 00 00 00
e0: 13 00 06 03 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> that trigger this? Could it be a difference in .config files? The
.config attached.
> trace says it is falling back to legacy interrupts, does the system
> continue to work and does the network continue to function in that
> mode?
Not really. I tried halting it after the splat but it started powering
down and deadlocked on something. Had to cold-reset.
> Any other information you think can help me reproduce the issue would
> be appreciated.
So the real question is why does it fail setting up MSI interrupts. I'd
look into that part of the driver...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Attachment:
config-4.10.0+.gz
Description: application/gzip