Re: [debug patch] printk: Add a printk killswitch to robustify NMIwatchdog messages

From: Arne Jansen
Date: Sun Jun 05 2011 - 13:42:31 EST


On 05.06.2011 19:20, Ingo Molnar wrote:

* Arne Jansen<lists@xxxxxxxxxxxxxx> wrote:

With the all-locks-printed output we can double check what locks are
held.

btw, the output posted earlier also contains some BUG: spinlock
lockup.

hm, it's hard to interpret that without the spin_lock()/unlock()
logic keeping the dumps apart.

The locking was in place from the beginning. As the output is still
scrambled, there are other sources for BUG/WARN outside the watchdog
that trigger in parallel. Maybe we should protect the whole BUG/WARN
mechanism with a lock and send it to early_printk from the beginning,
so we don't have to wait for the watchdog to kill printk off and the
first BUG can come through.
Or just let WARN/BUG kill off printk instead of the watchdog (though
I have to get rid of that syslog-WARN on startup).


Was lockdep enabled as you started the test?

At least it was in the config, but haven't double checked. ATM, it is.


but ... if the lock is reasonably sorted then it's this one:

<0>BUG: spinlock lockup on CPU#3, modprobe/22211, ffffffff81e1c0c0
Pid: 22211, comm: modprobe Tainted: G W 2.6.39-rc3+ #19
Call Trace:
[<ffffffff813af306>] do_raw_spin_lock+0x156/0x170
[<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
[<ffffffff81092df6>] ? vprintk+0x76/0x4a0
[<ffffffff81092df6>] vprintk+0x76/0x4a0
[<ffffffff810c5f8d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff81859e19>] printk+0x63/0x65
[<ffffffff813af301>] do_raw_spin_lock+0x151/0x170
[<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
[<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
[<ffffffff81092df6>] ? vprintk+0x76/0x4a0
[<ffffffff81092df6>] vprintk+0x76/0x4a0
[<ffffffff8108758b>] ? cpuacct_charge+0x9b/0xb0
[<ffffffff8108750f>] ? cpuacct_charge+0x1f/0xb0
[<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
[<ffffffff81859e19>] printk+0x63/0x65
[<ffffffff813af090>] spin_bug+0x70/0xf0
[<ffffffff813af2d9>] do_raw_spin_lock+0x129/0x170
[<ffffffff8108a4bd>] ? try_to_wake_up+0x29d/0x350
[<ffffffff8185ce71>] _raw_spin_lock+0x51/0x70
[<ffffffff81092df6>] ? vprintk+0x76/0x4a0

and it occured before the lockup in the scheduler.

Which could be due to a race between disabling lockdep on one CPU and
the scheduler doing the lock-held check on another CPU.

Do you get any messages after the assert is removed, during the test?

No.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/