RE: [patch] PCI: disable MSI on more ATI NorthBridges
From: Shane Huang
Date: Fri Oct 19 2007 - 09:17:36 EST
Hi Guys:
> This logic seems backwards, to me. "shoot first, ask questions later"
> To me this it not how to approach this problem.
>
> Once you turn MSI off, there is next to no incentive to fix the
> problem because users aren't running into it any longer.
>
> The only two devices in that bug report which should be using MSI
> would be the SATA controller and the broadcom ethernet NIC. And by
> the failed bootup logs provided by the user the problem is clearly
> with the SATA controller.
>
> One common problem we're finding is that some devices have a hardware
> bug where setting INTX_DISABLE in the PCI COMMAND register masks MSI
> interrupts too.
>
> I mention this because the user in that report mentions that the
> kernel upgrade causes the failure, and one thing we started doing not
> too long ago was to set the INTX_DISABLE bit when MSI is enabled for a
> device.
Yes, you are right, to find out the root cause is better. Thank you
for all your suggestion and information to us.
Since we have little experience on PCI and MSI here, we had to try to
disable MSI before we find a better solution. But as you are giving
us help now on this case, it's great! Then let's go...
> So maybe this SATA controller has this problem too. It is easy to
> test, simply comment out all of the pci_intx() function calls in
> drivers/pci/msi.c and perform a test boot with MSI enabled.
>
> I would rather you approach analysis of these kinds of MSI bugs in
> this manner, instead of disabling MSI wholesale. Because with the
> latter approach it is nearly guarenteed that the real reason will only
> be discovered with an extremely low priority.
I'm using kernel 2.6.23-rc5 to debug this MSI problem, which can NOT
boot on our Trevally board(RS690+SB700) without any kernel modification.
But if I comment out all the pci_intx() function calls in
drivers/pci/msi.c, it can boot now with MSI enabled as you expected!
# cat /proc/interrupts
CPU0 CPU1
0: 318 174060 IO-APIC-edge timer
8: 0 1 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
16: 0 204 IO-APIC-fasteoi HDA Intel
17: 0 479 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb6
18: 1 2 IO-APIC-fasteoi ohci_hcd:usb3, ohci_hcd:usb4, ohci_hcd:usb5
19: 0 0 IO-APIC-fasteoi ehci_hcd:usb7
22: 4 1 IO-APIC-fasteoi yenta
8412: 0 1315 PCI-MSI-edge eth0
8413: 381 4858 PCI-MSI-edge ahci
NMI: 0 0
LOC: 174285 174210
ERR: 0
Also if I keep the pci_intx() calls in drivers/pci/msi.c and ONLY
comment out the pci_intx() call in drivers/ata/ahci.c
My system can boot up too with MSI enabled!
So does it mean that the root cause is our SB700 SATA controller
has a hardware bug where setting INTX_DISABLE in the PCI COMMAND
register masks MSI interrupts too?
And what is the software solution or workaround?
I will continue debug this MSI problem next week. Any suggestions,
please don't hesitate to tell us.
Thanks
Best Regards
Shane
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/