Re: Linux 3.1-rc9

From: Simon Kirby
Date: Tue Oct 25 2011 - 11:26:42 EST


On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:

> On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > It does not look related.
>
> Yeah, the only lock held there seems to be the socket lock, and it
> looks like all CPU's are spinning on it.
>
> > Could you try to reproduce that problem with
> > lockdep enabled? lockdep might make it go away, but it's definitely
> > worth a try.
>
> And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> some odd networking thing. It sounds unlikely, but maybe some error
> case you get into doesn't release the socket lock.
>
> I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> lock thing is separate, iirc.

I think the config option you were trying to think of is
CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.

By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:

/*
* We can walk the hash lockfree, because the hash only
* grows, and we are careful when adding entries to the end:
*/
list_for_each_entry(class, hash_head, hash_entry) {
if (class->key == key) {
WARN_ON_ONCE(class->name != lock->name);
return class;
}
}

[19274.691090] ------------[ cut here ]------------
[19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
[19274.691112] Hardware name: PowerEdge 2950
[19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
[19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
[19274.691141] Call Trace:
[19274.691149] [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
[19274.691156] [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
[19274.691163] [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
[19274.691169] [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
[19274.691175] [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
[19274.691181] [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[19274.691185] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691191] [<ffffffff813a4f8a>] ? __delay+0xa/0x10
[19274.691197] [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
[19274.691201] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
[19274.691205] [<ffffffff8104a302>] double_rq_lock+0x52/0x80
[19274.691210] [<ffffffff81058167>] load_balance+0x897/0x16e0
[19274.691215] [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
[19274.691219] [<ffffffff8104d172>] ? update_shares+0xd2/0x150
[19274.691226] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691232] [<ffffffff816f2608>] __schedule+0x8d8/0xa20
[19274.691238] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
[19274.691243] [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
[19274.691249] [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
[19274.691254] [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
[19274.691258] [<ffffffff816f282a>] schedule+0x3a/0x60
[19274.691265] [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
[19274.691270] [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
[19274.691276] [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
[19274.691282] [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
[19274.691291] [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
[19274.691297] [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
[19274.691303] [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
[19274.691309] [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
[19274.691317] [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
[19274.691323] [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
[19274.691330] [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
[19274.691336] [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
[19274.691343] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691351] [<ffffffff8101a129>] ? sched_clock+0x9/0x10
[19274.691356] [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
[19274.691363] [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
[19274.691370] [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
[19274.691376] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691383] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
[19274.691390] [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
[19274.691396] [<ffffffff8113e647>] sys_poll+0x77/0x110
[19274.691402] [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
[19274.691407] ---[ end trace 74fbaae9066aadcc ]---

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/