Re: Linux 3.1-rc9
From: Yong Zhang
Date: Tue Oct 25 2011 - 21:48:14 EST
On Tue, Oct 25, 2011 at 08:26:31AM -0700, Simon Kirby wrote:
> On Tue, Oct 18, 2011 at 01:12:41PM -0700, Linus Torvalds wrote:
>
> > On Tue, Oct 18, 2011 at 12:48 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > >
> > > It does not look related.
> >
> > Yeah, the only lock held there seems to be the socket lock, and it
> > looks like all CPU's are spinning on it.
> >
> > > Could you try to reproduce that problem with
> > > lockdep enabled? lockdep might make it go away, but it's definitely
> > > worth a try.
> >
> > And DEBUG_SPINLOCK / DEBUG_SPINLOCK_SLEEP too. Maybe you're triggering
> > some odd networking thing. It sounds unlikely, but maybe some error
> > case you get into doesn't release the socket lock.
> >
> > I think PROVE_LOCKING already enables DEBUG_SPINLOCK, but the sleeping
> > lock thing is separate, iirc.
>
> I think the config option you were trying to think of is
> CONFIG_DEBUG_ATOMIC_SLEEP, which enables CONFIG_PREEMPT_COUNT.
>
> By the way, we got this WARN_ON_ONCE while running lockdep elsewhere:
>
> /*
> * We can walk the hash lockfree, because the hash only
> * grows, and we are careful when adding entries to the end:
> */
> list_for_each_entry(class, hash_head, hash_entry) {
> if (class->key == key) {
> WARN_ON_ONCE(class->name != lock->name);
Someone has hit this before, maybe you can try the patch in:
http://marc.info/?l=linux-kernel&m=131919035525533
Thanks,
Yong
> return class;
> }
> }
>
> [19274.691090] ------------[ cut here ]------------
> [19274.691107] WARNING: at kernel/lockdep.c:690 __lock_acquire+0xfd6/0x2180()
> [19274.691112] Hardware name: PowerEdge 2950
> [19274.691115] Modules linked in: drbd lru_cache cn ipmi_devintf ipmi_si ipmi_msghandler sata_sil24 bnx2
> [19274.691137] Pid: 4416, comm: heartbeat Not tainted 3.1.0-hw-lockdep+ #52
> [19274.691141] Call Trace:
> [19274.691149] [<ffffffff81098f96>] ? __lock_acquire+0xfd6/0x2180
> [19274.691156] [<ffffffff8105c4f0>] warn_slowpath_common+0x80/0xc0
> [19274.691163] [<ffffffff8105c545>] warn_slowpath_null+0x15/0x20
> [19274.691169] [<ffffffff81098f96>] __lock_acquire+0xfd6/0x2180
> [19274.691175] [<ffffffff8109a2e9>] ? lock_release_non_nested+0x1a9/0x340
> [19274.691181] [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
> [19274.691185] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691191] [<ffffffff813a4f8a>] ? __delay+0xa/0x10
> [19274.691197] [<ffffffff816f55fa>] _raw_spin_lock_nested+0x3a/0x50
> [19274.691201] [<ffffffff8104a302>] ? double_rq_lock+0x52/0x80
> [19274.691205] [<ffffffff8104a302>] double_rq_lock+0x52/0x80
> [19274.691210] [<ffffffff81058167>] load_balance+0x897/0x16e0
> [19274.691215] [<ffffffff81058199>] ? load_balance+0x8c9/0x16e0
> [19274.691219] [<ffffffff8104d172>] ? update_shares+0xd2/0x150
> [19274.691226] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691232] [<ffffffff816f2608>] __schedule+0x8d8/0xa20
> [19274.691238] [<ffffffff816f2572>] ? __schedule+0x842/0xa20
> [19274.691243] [<ffffffff81063e87>] ? local_bh_enable+0xa7/0x110
> [19274.691249] [<ffffffff8169c008>] ? unix_stream_recvmsg+0x1d8/0x7f0
> [19274.691254] [<ffffffff81614c88>] ? dev_queue_xmit+0x1a8/0x8a0
> [19274.691258] [<ffffffff816f282a>] schedule+0x3a/0x60
> [19274.691265] [<ffffffff816f4515>] schedule_hrtimeout_range_clock+0x105/0x120
> [19274.691270] [<ffffffff81096c9d>] ? trace_hardirqs_on+0xd/0x10
> [19274.691276] [<ffffffff81080d89>] ? add_wait_queue+0x49/0x60
> [19274.691282] [<ffffffff816f453e>] schedule_hrtimeout_range+0xe/0x10
> [19274.691291] [<ffffffff8113dc04>] poll_schedule_timeout+0x44/0x70
> [19274.691297] [<ffffffff8113e29c>] do_sys_poll+0x33c/0x4f0
> [19274.691303] [<ffffffff8113dcf0>] ? poll_freewait+0xc0/0xc0
> [19274.691309] [<ffffffff8113ddf0>] ? __pollwait+0x100/0x100
> [19274.691317] [<ffffffff81602c3d>] ? sock_update_classid+0xfd/0x140
> [19274.691323] [<ffffffff81602bb0>] ? sock_update_classid+0x70/0x140
> [19274.691330] [<ffffffff815ff1f7>] ? sock_recvmsg+0xf7/0x130
> [19274.691336] [<ffffffff81098450>] ? __lock_acquire+0x490/0x2180
> [19274.691343] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691351] [<ffffffff8101a129>] ? sched_clock+0x9/0x10
> [19274.691356] [<ffffffff810944cd>] ? trace_hardirqs_off+0xd/0x10
> [19274.691363] [<ffffffff815ffb0b>] ? sys_recvfrom+0xbb/0x120
> [19274.691370] [<ffffffff81082540>] ? process_cpu_clock_getres+0x10/0x10
> [19274.691376] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691383] [<ffffffff8110427e>] ? might_fault+0x4e/0xa0
> [19274.691390] [<ffffffff816fe0ca>] ? sysret_check+0x2e/0x69
> [19274.691396] [<ffffffff8113e647>] sys_poll+0x77/0x110
> [19274.691402] [<ffffffff816fe092>] system_call_fastpath+0x16/0x1b
> [19274.691407] ---[ end trace 74fbaae9066aadcc ]---
>
> Simon-
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Only stand for myself
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/