Re: [debug patch] printk: Add a printk killswitch to robustify NMIwatchdog messages

From: Arne Jansen
Date: Sun Jun 05 2011 - 15:30:48 EST


On 05.06.2011 20:59, Ingo Molnar wrote:

* Arne Jansen<lists@xxxxxxxxxxxxxx> wrote:

hm, it's hard to interpret that without the spin_lock()/unlock()
logic keeping the dumps apart.

The locking was in place from the beginning. [...]

Ok, i was surprised it looked relatively ordered :-)

[...] As the output is still scrambled, there are other sources for
BUG/WARN outside the watchdog that trigger in parallel. Maybe we
should protect the whole BUG/WARN mechanism with a lock and send it
to early_printk from the beginning, so we don't have to wait for
the watchdog to kill printk off and the first BUG can come through.
Or just let WARN/BUG kill off printk instead of the watchdog
(though I have to get rid of that syslog-WARN on startup).

I had yet another look at your lockup.txt and i think the main cause
is the WARN_ON() caused by the not-held pi_lock. The lockup there
causes other CPUs to wedge in printk, which triggers spinlock-lockup
messages there.

So i think the primary trigger is the pi_lock WARN_ON() (as your
bisection has confirmed that too), everything else comes from this.

Unfortunately i don't think we can really 'fix' the problem by
removing the assert. By all means the assert is correct: pi_lock
should be held there. If we are not holding it then we likely won't
crash in an easily visible way - it's a lot easier to trigger asserts
than to trigger obscure side-effects of locking bugs.

It is also a mystery why only printk() triggers this bug. The wakeup
done there is not particularly special, so by all means we should
have seen similar lockups elsewhere as well - not just with
printk()s. Yet we are not seeing them.

From the timing I see I'd guess it has something to do with the
scheduler kicking in during printk. I'm neither familiar with the
printk code nor with the scheduler.
If you have any ideas what I should test or add please let me know.

-Arne


So some essential piece of the puzzle is still missing.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/