Re: Regression: 2.6.34 boot fails on E5405 system, bisected: de08e2c26

From: Robert Hancock
Date: Tue Jul 13 2010 - 21:56:30 EST


On 07/13/2010 07:17 PM, Ben Greear wrote:
On 07/13/2010 05:36 PM, Ben Greear wrote:
We're seeing boot failures on multiple machines, running FC8 and
F11. I bisected on an FC8 32-bit system. Newer hardware works,
but these older ones do not.

A console log of the hang is found later in this email.

Please let me know if you would like any additional information,
and I will be happy to test patches.

The same failure happens in 2.6.34.1, so the fix does not appear to
be in the stable tree yet.


I added some printks to the offending code. It seems the problem
is that the fixed_bar_cap method in arch/x86/pci/mrst.c loops forever:

# Endless loop of this spewing to console...

pcie_cap: 268435456Checking vendor..
pos after shift: 256
Before read..

Can you print out bus->number and devfn and look that up in lspci to find out which device it's hitting? It looks like there's a device with a PCI Express extended capability header that has a extended capability ID of 0000h and a next capability offset of 100h, which points to itself, causing the infinite loop. I'm guessing that if pcie_cap >> 20 <= pos then it should give up and break out of the loop, since it means that the next capability pointer is invalidly pointing to the same or a previous entry..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/