Re: [patch] increase spinlock-debug looping timeouts (write_lockand NMI)

From: Andrew Morton
Date: Tue Jun 20 2006 - 17:09:34 EST


On Tue, 20 Jun 2006 09:11:36 -0700 (PDT)
Dave Olson <olson@xxxxxxxxxxxx> wrote:

> On Mon, 19 Jun 2006, Andrew Morton wrote:
> | > We'll see very long delays when 8 MPI processes exit "simultaneously", and sometimes
> | > get NMI, sometimes system hangs, and sometimes just hung up for many seconds (and
> | > often in that state, doing sysrq-P or sysrq-T will make things happy again).
> | >
> |
> | OK. I assume these processes have done a mmap(MAP_SHARED) of a lot of
> | memory?
>
> Yep. Some shared with kernel modules, some of device address space.
>
> | > A typical trace looks like this (on an fc4 2.6.16 kernel):
> |
> | fc4? You seem to have an RH-FCx which doesn't enable
> | CONFIG_DEBUG_SPINLOCK. Or maybe we didn't have all that debug code in
> | 2.6.16. Doesn't matter, really.
>
> Intended to be more or less stock fc4 but with CONFIG_PCI_MSI=y and
> 2.6.17-based patch so the 8131 MSI quirk isn't enabled.
>
> >From the config file:
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y

OK, I goofed again.

It would be super-interesting to know whether CONFIG_DEBUG_SPINLOCK=n
improves things.

> | With a -stable backport. I suspect this is triggerable on demand.
>
> So far we've only got the one test case, but it's quite reliable.
> We hit one of the 3 cases (long > 60 seconds) "hangs" at exit,
> NMI, or dead system hang, every time we run the test case (well,
> perhaps 1 out of 20 times everything is "just fine", probably
> something perturbs it enough to let one or more processes get
> through the critical section ahead of the whole gang).

Reproducability is a win.

You should have complained earlier!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/