Re: 3.6-rc7 boot crash + bisection
From: Alex Williamson
Date: Tue Sep 25 2012 - 15:44:03 EST
On Tue, 2012-09-25 at 20:54 +0200, Florian Dazinger wrote:
> Am Tue, 25 Sep 2012 12:32:50 -0600
> schrieb Alex Williamson <alex.williamson@xxxxxxxxxx>:
>
> > On Mon, 2012-09-24 at 21:03 +0200, Florian Dazinger wrote:
> > > Hi,
> > > I think I've found a regression, which causes an early boot crash, I
> > > appended the kernel output via jpg file, since I do not have a serial
> > > console or sth.
> > >
> > > after bisection, it boils down to this commit:
> > >
> > > 9dcd61303af862c279df86aa97fde7ce371be774 is the first bad commit
> > > commit 9dcd61303af862c279df86aa97fde7ce371be774
> > > Author: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Date: Wed May 30 14:19:07 2012 -0600
> > >
> > > amd_iommu: Support IOMMU groups
> > >
> > > Add IOMMU group support to AMD-Vi device init and uninit code.
> > > Existing notifiers make sure this gets called for each device.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Signed-off-by: Joerg Roedel <joerg.roedel@xxxxxxx>
> > >
> > > :040000 040000 2f6b1b8e104d6dfec0abaa9646750f9b5a4f4060
> > > 837ae95e84f6d3553457c4df595a9caa56843c03 M drivers
> >
> > [switching back to mailing list thread]
> >
> > I asked Florian for dmesg w/ amd_iommu_dump, here's the relevant lines:
> >
> > [ 1.485645] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
> > [ 1.485683] AMD-Vi: mmio-addr: 00000000feb20000
> > [ 1.485901] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:00.0 flags: 00
> > [ 1.485935] AMD-Vi: DEV_RANGE_END devid: 00:00.2
> > [ 1.485969] AMD-Vi: DEV_SELECT devid: 00:02.0 flags: 00
> > [ 1.486002] AMD-Vi: DEV_SELECT_RANGE_START devid: 01:00.0 flags: 00
> > [ 1.486036] AMD-Vi: DEV_RANGE_END devid: 01:00.1
> > [ 1.486070] AMD-Vi: DEV_SELECT devid: 00:04.0 flags: 00
> > [ 1.486103] AMD-Vi: DEV_SELECT devid: 02:00.0 flags: 00
> > [ 1.486137] AMD-Vi: DEV_SELECT devid: 00:05.0 flags: 00
> > [ 1.486170] AMD-Vi: DEV_SELECT devid: 03:00.0 flags: 00
> > [ 1.486204] AMD-Vi: DEV_SELECT devid: 00:06.0 flags: 00
> > [ 1.486238] AMD-Vi: DEV_SELECT devid: 04:00.0 flags: 00
> > [ 1.486271] AMD-Vi: DEV_SELECT devid: 00:07.0 flags: 00
> > [ 1.486305] AMD-Vi: DEV_SELECT devid: 05:00.0 flags: 00
> > [ 1.486338] AMD-Vi: DEV_SELECT devid: 00:09.0 flags: 00
> > [ 1.486372] AMD-Vi: DEV_SELECT devid: 06:00.0 flags: 00
> > [ 1.486406] AMD-Vi: DEV_SELECT devid: 00:0b.0 flags: 00
> > [ 1.486439] AMD-Vi: DEV_SELECT devid: 07:00.0 flags: 00
> > [ 1.486473] AMD-Vi: DEV_ALIAS_RANGE devid: 08:01.0 flags: 00 devid_to: 08:00.0
> > [ 1.486510] AMD-Vi: DEV_RANGE_END devid: 08:1f.7
> > [ 1.486548] AMD-Vi: DEV_SELECT devid: 00:11.0 flags: 00
> > [ 1.486581] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:12.0 flags: 00
> > [ 1.486620] AMD-Vi: DEV_RANGE_END devid: 00:12.2
> > [ 1.486654] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:13.0 flags: 00
> > [ 1.486688] AMD-Vi: DEV_RANGE_END devid: 00:13.2
> > [ 1.486721] AMD-Vi: DEV_SELECT devid: 00:14.0 flags: d7
> > [ 1.486755] AMD-Vi: DEV_SELECT devid: 00:14.3 flags: 00
> > [ 1.486788] AMD-Vi: DEV_SELECT devid: 00:14.4 flags: 00
> > [ 1.486822] AMD-Vi: DEV_ALIAS_RANGE devid: 09:00.0 flags: 00 devid_to: 00:14.4
> > [ 1.486859] AMD-Vi: DEV_RANGE_END devid: 09:1f.7
> > [ 1.486897] AMD-Vi: DEV_SELECT devid: 00:14.5 flags: 00
> > [ 1.486931] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:16.0 flags: 00
> > [ 1.486965] AMD-Vi: DEV_RANGE_END devid: 00:16.2
> > [ 1.487055] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
> >
> >
> > > lspci:
> > > 00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 port B) (rev 02)
> > > 00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory Management Unit (IOMMU)
> > > 00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port B)
> > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port D)
> > > 00:05.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port E)
> > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port F)
> > > 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port G)
> > > 00:09.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (PCI express gpp port H)
> > > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (NB-SB link)
> > > 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40)
> > > 00:12.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:12.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:13.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:13.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:14.0 SMBus: Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller (rev 42)
> > > 00:14.3 ISA bridge: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
> > > 00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI Bridge (rev 40)
> > > 00:14.5 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
> > > 00:16.0 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
> > > 00:16.2 USB controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller
> > > 00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
> > > 00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
> > > 00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
> > > 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
> > > 00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
> > > 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV730XT [Radeon HD 4670]
> > > 01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI RV710/730 HDMI Audio [Radeon HD 4000 series]
> > > 02:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
> > > 03:00.0 Ethernet controller: Intel Corporation 82583V Gigabit Network Connection
> > > 04:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
> > > 05:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
> > > 06:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
> > > 07:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
> > > 08:04.0 Multimedia audio controller: C-Media Electronics Inc CMI8788
> > > [Oxygen HD Audio]
> >
> > We can see this is clearly wrong:
> >
> > [ 1.486473] AMD-Vi: DEV_ALIAS_RANGE devid: 08:01.0 flags: 00 devid_to: 08:00.0
> > [ 1.486510] AMD-Vi: DEV_RANGE_END devid: 08:1f.7
> >
> > So the BIOS is telling us to alias everything in the range of 08:01.0 to
> > 08:1f.7 to device id 08:00.0, which doesn't exist :( Can you send lspci
> > -vvv? I suspect we'll find that 07:00.0 sources bus 08 and that alias
> > should really be to 07:00.0 instead of 08:00.0. Please also provide
> > dmidecode for this system, we may need to create a quirk for this box.
> > Thanks,
[corrected alias and range in text above, adding iommu list]
> 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (NB-SB link) (prog-if 00 [Normal decode])
> Bus: primary=00, secondary=07, subordinate=08, sec-latency=0
> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
> 07:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa) (prog-if 00 [Normal decode])
> Bus: primary=07, secondary=08, subordinate=08, sec-latency=32
> Capabilities: [60] Express (v1) PCI/PCI-X Bridge, MSI 00
> 08:04.0 Multimedia audio controller: C-Media Electronics Inc CMI8788 [Oxygen HD Audio]
> Subsystem: ASUSTeK Computer Inc. Virtuoso 100 (Xonar Essence STX)
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 32 (500ns min, 6000ns max)
> Interrupt: pin A routed to IRQ 32
> Region 0: I/O ports at b000 [size=256]
> Capabilities: [c0] Power Management version 2
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> Kernel driver in use: snd_virtuoso
>
Yep, my guess appears correct, the alias should be to device 07:00.0.
It looks like this is a x1 PCIe card, so I think that PLX bridge is on
the card. The system probably boots fine if you remove the audio card
(or of course with amd_iommu=off). It looks like there is one rev newer
BIOS for this motherboard; we should probably exhaust the possibility
that this bug has already been fixed in BIOS 1503 before we implement a
quirk. Can you test this?
Joerg, any thoughts on a quirk for this? Unfortunately we can't just
skip IOMMU groups when an alias is broken because it puts the other
IOMMU groups at risk that might not actually be isolated from this
device. It looks like we parse the alias info before PCI is probed, so
maybe we'd need to call the quirk from iommu_init_device itself.
Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/