On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:I apologize for late night post, and a lot of trash left in report.
I will cleanup it up now, and send with CC to maintainers.
Server job are proxy, with very high rate of new connections.
Deadlock at peaktime can be easily reproduced in 10-15 minutes.
Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
3.3.4). It is not easy to try older kernel,
but if required i can try.
Usually also, because SYN rate very high, i can see:
[ 51.612987] TCP: Possible SYN flooding on port 8080. Sending
cookies. Check SNMP counters.
[ 762.903868]
[ 762.903880] =================================
[ 762.903890] [ INFO: inconsistent lock state ]
[ 762.903903] 3.3.4-build-0061 #8 Not tainted
[ 762.904133] ---------------------------------
[ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 762.904542] (key#3){+.?...}, at: [<c0232cc4>]
__percpu_counter_sum+0xd/0x58
[ 762.904542] {IN-SOFTIRQ-W} state was registered at:
[ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26
[ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
[ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
[ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c
[ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200
[ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
[ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404
[ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
[ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7
[ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
[ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
[ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
[ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
[ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45
[ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
[ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
[ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e
[ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368
[ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d
[ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34
[ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24
[ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
[ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
[ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
[ 762.904542] [<c02db228>] net_rx_action+0x90/0x160
[ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118
[ 762.904542] irq event stamp: 156915469
[ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>]
__slab_alloc.clone.58.clone.63+0xc4/0x2de
[ 762.904542] hardirqs last disabled at (156915468): [<c019b452>]
__slab_alloc.clone.58.clone.63+0x22/0x2de
[ 762.904542] softirqs last enabled at (156915466): [<c02ce677>]
lock_sock_nested+0x64/0x6c
[ 762.904542] softirqs last disabled at (156915464): [<c0349914>]
_raw_spin_lock_bh+0xe/0x45
[ 762.904542]
[ 762.904542] other info that might help us debug this:
[ 762.904542] Possible unsafe locking scenario:
[ 762.904542]
[ 762.904542] CPU0
[ 762.904542] ----
[ 762.904542] lock(key#3);
[ 762.904542]<Interrupt>
[ 762.904542] lock(key#3);
[ 762.904542]
[ 762.904542] *** DEADLOCK ***
[ 762.904542]
[ 762.904542] 1 lock held by squid/1603:
[ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
lock_sock+0xa/0xc
[ 762.904542]
[ 762.904542] stack backtrace:
[ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
[ 762.904542] Call Trace:
[ 762.904542] [<c0347b73>] ? printk+0x18/0x1d
[ 762.904542] [<c015873a>] valid_state+0x1f6/0x201
[ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb
[ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
[ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77
[ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26
[ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
[ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
[ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
[ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85
[ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40
[ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58
[ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
[ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100
[ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
[ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
[ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b
[ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30
[ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a
[ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda
[ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb
[ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9
[ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f
[ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab
[ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c
[ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e
[ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb
[ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83
OK, so when we have memory pressure we can call
percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
complains...
This bug was probably added in 2008, in commit 1748376b6626a
(net: Use a percpu_counter for sockets_allocated)
Hmm, no this patch was fine.
Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
(foundations of per-cgroup memory pressure controlling.)
Because he replaced the safe percpu_counter_read_positive() call to
unsafe percpu_counter_sum_positive() in
sk_sockets_allocated_read_positive()
But anyway the patch I sent should fix the problem.
Thinking again, I am not sure why Glauber did this change. He probably
made a typo or something.