Re: Possible circular locking dependency while running valgrind on an executable over CIFS

From: Steve French
Date: Mon Apr 21 2008 - 08:12:41 EST


I thought that this was considered a false positive before, but Shaggy
might remember if that was the same warning.

But it would seem impractical to delay sending read requests from
readpages - seems to defeat the whole purpose of having a higher
performance read interface. If tcpsendmsg can deadlock when called
from readpages or other code which holds this sem, wouldn't this
deadlock all over the place (network block device drivers, SAN
drivers, various network file systems, some cluster file systems)
since readpages is a common entry point and sendmsg or equivalent
could be used by various networking relating block drivers, file
systems etc.


On Mon, Apr 21, 2008 at 4:14 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Sun, 2008-04-20 at 20:04 -0700, Brandon Ehle wrote:
> While running valgrind on various executables located on a CIFS share,
> I eventually got the message below. The machine in question is
> running 2.6.25 with a few of the debug options enabled, including lock
> debugging. Any idea if this is indicative of a real problem or just a
> false positive?

Looks like a simple AB-BA deadlock to me. I admint being a bit ignorant
about CIFS (nor really minding that) however it looks like you need to
delay or decouple the actual sending of the read request from
the .readpages() implementation.

Steve?

> ==============================
> =========================
> > [ INFO: possible circular locking dependency detected ]
> > 2.6.25 #29
> > -------------------------------------------------------
> > valgrind.bin/23200 is trying to acquire lock:
> > (sk_lock-AF_INET){--..}, at: [<c0557412>] tcp_sendmsg+0x17/0xb00
> >
> > but task is already holding lock:
> > (&mm->mmap_sem){----}, at: [<c05c2abd>] do_page_fault+0x16d/0x8e0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (&mm->mmap_sem){----}:
> > [<c014bb79>] __lock_acquire+0xfa9/0xfdb
> > [<c014bc35>] lock_acquire+0x8a/0xbd
> > [<c05bf4f8>] down_read+0x3d/0x85
> > [<c05c2ad5>] do_page_fault+0x185/0x8e0
> > [<c05c102a>] error_code+0x72/0x78
> > [<c0526357>] memcpy_toiovec+0x43/0x5a
> > [<c0526910>] skb_copy_datagram_iovec+0x47/0x1d7
> > [<c0556e6a>] tcp_recvmsg+0xb3d/0xcc8
> > [<c051ff9f>] sock_common_recvmsg+0x3d/0x53
> > [<c051e8cf>] sock_recvmsg+0x10d/0x12a
> > [<c051f727>] sys_recvfrom+0x7c/0xc7
> > [<c051f7a8>] sys_recv+0x36/0x38
> > [<c051fbe8>] sys_socketcall+0x163/0x25b
> > [<c0104d78>] sysenter_past_esp+0x6d/0xc5
> > [<ffffffff>] 0xffffffff
> >
> > -> #0 (sk_lock-AF_INET){--..}:
> > [<c014b771>] __lock_acquire+0xba1/0xfdb
> > [<c014bc35>] lock_acquire+0x8a/0xbd
> > [<c052098c>] lock_sock_nested+0xb5/0xc2
> > [<c0557412>] tcp_sendmsg+0x17/0xb00
> > [<c051ea28>] sock_sendmsg+0xf9/0x116
> > [<c051ed79>] kernel_sendmsg+0x28/0x37
> > [<c02cd885>] SendReceive2+0x197/0x69b
> > [<c02bb122>] CIFSSMBRead+0x110/0x2a0
> > [<c02c4d81>] cifs_readpages+0x18c/0x5c7
> > [<c016db8a>] __do_page_cache_readahead+0x161/0x1e1
> > [<c016dc4f>] do_page_cache_readahead+0x45/0x59
> > [<c0166c6e>] filemap_fault+0x30f/0x467
> > [<c01739f4>] __do_fault+0x56/0x43a
> > [<c0175530>] handle_mm_fault+0x150/0x873
> > [<c05c2ced>] do_page_fault+0x39d/0x8e0
> > [<c05c102a>] error_code+0x72/0x78
> > [<ffffffff>] 0xffffffff
> >
> > other info that might help us debug this:
> >
> > 1 lock held by valgrind.bin/23200:
> > #0: (&mm->mmap_sem){----}, at: [<c05c2abd>] do_page_fault+0x16d/0x8e0
> >
> > stack backtrace:
> > Pid: 23200, comm: valgrind.bin Not tainted 2.6.25 #29
> > [<c014a979>] print_circular_bug_tail+0x68/0x6a
> > [<c014b771>] __lock_acquire+0xba1/0xfdb
> > [<c0104e81>] ? restore_nocheck+0x12/0x15
> > [<c014bc35>] lock_acquire+0x8a/0xbd
> > [<c0557412>] ? tcp_sendmsg+0x17/0xb00
> > [<c052098c>] lock_sock_nested+0xb5/0xc2
> > [<c0557412>] ? tcp_sendmsg+0x17/0xb00
> > [<c0557412>] tcp_sendmsg+0x17/0xb00
> > [<c0148a7d>] ? get_lock_stats+0x1b/0x3e
> > [<c0142185>] ? getnstimeofday+0x34/0xde
> > [<c0113253>] ? lapic_next_event+0x15/0x1e
> > [<c051ea28>] sock_sendmsg+0xf9/0x116
> > [<c01689e2>] ? mempool_alloc_slab+0xe/0x10
> > [<c013cb12>] ? autoremove_wake_function+0x0/0x3a
> > [<c05c078f>] ? _spin_unlock+0x27/0x42
> > [<c02cca11>] ? allocate_mid+0xec/0x149
> > [<c051ed79>] kernel_sendmsg+0x28/0x37
> > [<c02cd885>] SendReceive2+0x197/0x69b
> > [<c02bb122>] CIFSSMBRead+0x110/0x2a0
> > [<c02c4d81>] cifs_readpages+0x18c/0x5c7
> > [<c016bdf3>] ? __alloc_pages+0x6d/0x3c7
> > [<c014a3b1>] ? mark_held_locks+0x4d/0x84
> > [<c01604b1>] ? __rcu_read_unlock+0x81/0xa4
> > [<c02c4bf5>] ? cifs_readpages+0x0/0x5c7
> > [<c016db8a>] __do_page_cache_readahead+0x161/0x1e1
> > [<c016dc4f>] do_page_cache_readahead+0x45/0x59
> > [<c0166c6e>] filemap_fault+0x30f/0x467
> > [<c01739f4>] __do_fault+0x56/0x43a
> > [<c0175530>] handle_mm_fault+0x150/0x873
> > [<c0109d8f>] ? native_sched_clock+0xa6/0xea
> > [<c05c2abd>] ? do_page_fault+0x16d/0x8e0
> > [<c0140740>] ? down_read_trylock+0x51/0x59
> > [<c05c2ced>] do_page_fault+0x39d/0x8e0
> > [<c014aea1>] ? __lock_acquire+0x2d1/0xfdb
> > [<c017862a>] ? vma_link+0xbc/0xd4
> > [<c01670c0>] ? __filemap_fdatawrite_range+0x61/0x6d
> > [<c01672ef>] ? filemap_fdatawrite+0x26/0x28
> > [<c02c603e>] ? cifs_flush+0x1c/0x5c
> > [<c0104dc1>] ? sysenter_past_esp+0xb6/0xc5
> > [<c014a596>] ? trace_hardirqs_on+0x10a/0x15a
> > [<c05c2950>] ? do_page_fault+0x0/0x8e0
> > [<c05c102a>] error_code+0x72/0x78
> > =======================
>
> <snip .config>
>

--
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/