Re: PROBLEM: kernel hang in ohci init

From: Satyam Sharma
Date: Thu Jul 12 2007 - 23:16:17 EST


Hi Timo,

Thanks for your report!

On 7/12/07, Timo Lindemann <tlindemann@xxxxxxxx> wrote:
a problem report to something giving me a real headache:

[2.] The version 2.6.22 of the linux kernel hangs when initializing the
integrated ohci controller of the nvidia MCP51 chipset (pci device ids
vendor:product == 10de:26d). I have traced through various printks that
pci_init calls pci_fixup_device, later on in quirk_usb_ohci_handoff
(file linux/drivers/usb/host/pci-quirks.c) kernel freezes in this
section:
...
if (control & OHCI_CTRL_IR) {
int wait_time = 500;
writel(OHCI_INTR_OC, base + OHCI_INTRENABLE);
writel(OHCI_ORC, base + OHCI_CMDSTATUS); // this never returns
...
after this, kernel apparently goes into busy waiting (fans gradually
turn louder) and hangs indefinitely. I have also made sure that writel
(in linux/include/asm/io.h) really is entered, but never returns.

[ Added David Brownell to Cc: ]

[6.] Reproducible by booting any version 2.6.21+ on that machine
(nvidia MCP51-Chipset, see the lspci output)
[...]
[7.7] What is striking about that problem is that kernel 2.6.20.6 does
not even enter the section mentioned in [2.].

2.6.20 works, 2.6.21 doesn't, right? You could try git-bisect on Linus'
tree (if you can use git) to find the offending commit that broke it:

http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html

and

http://www.reactivated.net/weblog/archives/2006/01/
using-git-bisect-to-find-buggy-kernel-patches/
(long URL broken in 2 lines)

You don't need to "make clean" between git-bisect builds, but be
prepared to lose a couple of hours on this still :-)

[7.1] the ver_linux output under 2.6.20.6, in the directory of 2.6.22,
says:

Gnu C 4.2.1

Others have reported problems booting with gcc-4.2-compiled
kernels too. Could you try building with 4.1?

Modules Loaded rt2500* nvidia* forcedeth

* nvidia and rt2500 are most assuredly not involved in this. They are
not loaded by that kernel.
[...]
[7.3] no modules have been configured (all in-kernel)

You're saying there are no modules, then how come those three
are loaded? Also try reproducing the problem without proprietary
(nvidia) drivers, please.

[7.5] (I cannot run this with 2.6.22. In 2.6.20.6, the output can be
retrieved from http://cip.uni-trier.de/~lindem/lspci.txt as this is
really large)

Ok.

[X.] I tried hard to understand what's going on, but ultimately, I could
not yet write a fix, workaround, or anything like that, so I am asking
for help/enlightenment, or even an already-done fix. Really very
sorry. Also, different options like noapic, nolapic, acpi=off,
pci=routeirq|biosirq|usepirqmask were already tried; I also tried
disabling quirks for that particular vendor:device-combination, which
leads to another freeze further along. Also, commenting the writel()
will hang indefinitely in the following wait_time loop.

I can only guess that it might
have to do with the patch
"commit 4302a595cd9c6363b495460497ecbda49fa16858
Author: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
Date: Fri Dec 15 06:53:55 2006 +1100
USB: Rework the OHCI quirk mecanism as suggested by David
"
but I don't really have a clue, so this might be groundless suspicion.
If so, I apologize about that.

As mentioned earlier, git-bisect could help us narrow this down.
It's not a silver bullet, but often useful.

[ BTW, just-after a new kernel release is often an unlucky period to
report bugs, it appears ... everybody gets busy with not missing the
merge window to push in their shiny new stuff :-) ]

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/