Re: [patch] increase spinlock-debug looping timeouts (write_lockand NMI)

From: Dave Olson
Date: Tue Jun 20 2006 - 12:10:19 EST


On Mon, 19 Jun 2006, Andrew Morton wrote:
| > We'll see very long delays when 8 MPI processes exit "simultaneously", and sometimes
| > get NMI, sometimes system hangs, and sometimes just hung up for many seconds (and
| > often in that state, doing sysrq-P or sysrq-T will make things happy again).
| >
|
| OK. I assume these processes have done a mmap(MAP_SHARED) of a lot of
| memory?

Yep. Some shared with kernel modules, some of device address space.

| > A typical trace looks like this (on an fc4 2.6.16 kernel):
|
| fc4? You seem to have an RH-FCx which doesn't enable
| CONFIG_DEBUG_SPINLOCK. Or maybe we didn't have all that debug code in
| 2.6.16. Doesn't matter, really.

Intended to be more or less stock fc4 but with CONFIG_PCI_MSI=y and
2.6.17-based patch so the 8131 MSI quirk isn't enabled.

>From the config file:
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y

| With a -stable backport. I suspect this is triggerable on demand.

So far we've only got the one test case, but it's quite reliable.
We hit one of the 3 cases (long > 60 seconds) "hangs" at exit,
NMI, or dead system hang, every time we run the test case (well,
perhaps 1 out of 20 times everything is "just fine", probably
something perturbs it enough to let one or more processes get
through the critical section ahead of the whole gang).

Dave Olson
olson@xxxxxxxxxxxx
http://www.unixfolk.com/dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/