Re: [patch] increase spinlock-debug looping timeouts from 1 sec to1 min

From: Andrew Morton
Date: Mon Jun 19 2006 - 15:59:19 EST


On Mon, 19 Jun 2006 13:39:44 +0200
Ingo Molnar <mingo@xxxxxxx> wrote:

>
> * Andrew Morton <akpm@xxxxxxxx> wrote:
>
> > > The write_trylock + __delay in the loop is not a problem or a bug, as
> > > the trylock will at most _increase_ the delay - and our goal is to not
> > > have a false positive, not to be absolutely accurate about the
> > > measurement here.
> >
> > Precisely. We have delays of over a second (but we don't know how
> > much more than a second). Let's say two seconds. The NMI watchdog
> > timeout is, what? Five seconds?
>
> i dont see the problem.

It's taking over a second to acquire a write_lock. A lock which is
unlikely to be held for more than a microsecond anywhere. That's really
bad, isn't it? Being on the edge of an NMI watchdog induced system crash
is bad, too.

> We'll have tried that lock hundreds of thousands
> of times before this happens. The NMI watchdog will only trigger if we
> do this with IRQs disabled.

tree_lock uses write_lock_irq().

> And it's not like the normal
> __write_lock_failed codepath would be any different: for heavily
> contended workloads the overhead is likely in the cacheline bouncing,
> not in the __delay().

Yes, it might also happen with !CONFIG_DEBUG_SPINLOCK. We need to find out
if that's so and if so, why.

> > That's getting too close. The result will be a total system crash.
> > And RH are shipping this.
>
> I dont see a connection. Pretty much the only thing the loop condition
> impacts is the condition under which we print out a 'i think we
> deadlocked' message.

I'm assuming that the additional delay in the debug code has worsened the
situation.

> Have i missed your point perhaps?

I get that impression ;) If it takes 1-2 seconds to get this lock then it
can take five seconds. a) that's just gross and b) the NMI watchdog will
nuke the box.

Why is it taking so long to get the lock?

Does it happen in non-debug mode?

What do we do about it?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/