Bisected: Kernel deadlock bug related to cgroups.

From: Brent Lovelace
Date: Tue Aug 30 2016 - 14:39:14 EST


I found a kernel deadlock regression bug introduced in the 4.4 kernel.

This bug occurs when running an ftp send-to-self test. We used libcurl against
a ftp server.
We are requesting to download a 45k file at about 17 times per second. It locks
up on its own after about 5 minutes. It will happen reliably when attempting to
ssh into the system while the ftp test is running.

Please CC me as I'm not on the lkml mailing list.


We used special routing rules to make send-to-self work.

[root@brent-6port ~]# ip ru show
11: from all to 10.0.0.1 iif eth3 lookup local
11: from all to 10.0.0.2 iif eth2 lookup local
50: from all oif eth0 lookup 1
50: from all oif eth1 lookup 2
50: from all oif eth3 lookup 4
50: from all oif eth2 lookup 3
50: from all oif eth4 lookup 5
400: from 10.0.0.1 lookup 4
400: from 10.0.0.2 lookup 3
512: from all lookup local
32766: from all lookup main
32767: from all lookup default

[root@brent-6port ~]# ip route show
default via 192.168.100.1 dev eth5
10.0.0.0/24 dev eth3 proto kernel scope link src 10.0.0.1
10.0.0.0/24 dev eth2 proto kernel scope link src 10.0.0.2
192.168.100.0/24 dev eth5 proto kernel scope link src 192.168.100.86

[root@brent-6port ~]# ip route show table 3
10.0.0.0/24 via 10.0.0.2 dev eth2

[root@brent-6port ~]# ip route show table 4
10.0.0.0/24 via 10.0.0.1 dev eth3

I bisected to this commit id:
----------------------------------------------------------------------------------
commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
Author: Oleg Nesterov <oleg@xxxxxxxxxx>
Date: Fri Nov 27 19:57:19 2015 +0100

cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()

If the new child migrates to another cgroup before cgroup_post_fork() calls
subsys->fork(), then both pids_can_attach() and pids_fork() will do the same
pids_uncharge(old_pids) + pids_charge(pids) sequence twice.

Change copy_process() to call threadgroup_change_begin/threadgroup_change_end
unconditionally. percpu_down_read() is cheap and this allows other cleanups,
see the next changes.

Also, this way we can unify cgroup_threadgroup_rwsem and dup_mmap_sem.

Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
Acked-by: Zefan Li <lizefan@xxxxxxxxxx>
Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

:040000 040000 ec136eff1aee5082f3aed3e9aa8c205e699d0c94 9123aae80295b265518797c6fd2a3af86b867648 M kernel
----------------------------------------------------------------------------------

Here is the console splat after the lockup occurs.

brent-6port login: INFO: task systemd:1 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd D ffff88007dfcfd10 0 1 0 0x00000000
ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
[<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
[<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
[<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f
[<ffffffff81169d21>] cgroup_procs_write+0xf/0x11
[<ffffffff81167b58>] cgroup_file_write+0x2c/0xd0
[<ffffffff8126dace>] kernfs_fop_write+0x106/0x14e
[<ffffffff812069cd>] __vfs_write+0x21/0xa0
[<ffffffff81126dad>] ? percpu_down_read+0x36/0x5f
[<ffffffff81208faa>] ? __sb_start_write+0x5a/0xab
[<ffffffff81208faa>] ? __sb_start_write+0x5a/0xab
[<ffffffff8120722d>] vfs_write+0x95/0xe0
[<ffffffff812079e0>] SyS_write+0x4b/0x79
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
4 locks held by systemd/1:
#0: (sb_writers#6){.+.+.+}, at: [<ffffffff81208faa>] __sb_start_write+0x5a/0xab
#1: (&of->mutex){+.+.+.}, at: [<ffffffff8126da5d>] kernfs_fop_write+0x95/0x14e
#2: (cgroup_mutex){+.+.+.}, at: [<ffffffff81167369>] cgroup_kn_lock_live+0x4b/0x98
#3: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff81126e44>] percpu_down_write+0x1f/0xc4
INFO: task kworker/u8:2:185 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u8:2 D ffff880036befb58 0 185 2 0x00000000
Workqueue: netns cleanup_net
ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
Call Trace:
[<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
[<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
[<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
[<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
[<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
[<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
[<ffffffff811101c3>] ? wake_up_q+0x42/0x42
[<ffffffff816c5ff2>] wait_for_common+0x36/0x50
[<ffffffff816c6024>] wait_for_completion+0x18/0x1a
[<ffffffff81106b5f>] kthread_stop+0xc8/0x217
[<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
[<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
[<ffffffff81606123>] cleanup_net+0x19f/0x234
[<ffffffff81101aa3>] process_one_work+0x237/0x46b
[<ffffffff8110216d>] worker_thread+0x1e7/0x292
[<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
[<ffffffff81106ded>] kthread+0xc4/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
3 locks held by kworker/u8:2/185:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>] process_one_work+0x141/0x46b
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>] process_one_work+0x141/0x46b
#2: (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234
INFO: task systemd-udevd:704 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
systemd-udevd D ffff880078767cd8 0 704 1 0x00000080
ffff880078767cd8 0000000000000000 0000000000000000 ffff88007e216f80
ffff8800366ea480 ffff880078768000 ffff8800366ea480 0000000000000000
0000000001200011 00005649eb435028 ffff880078767cf0 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810e9b97>] copy_process+0x5b8/0x195d
[<ffffffff810eb094>] _do_fork+0x8c/0x2fb
[<ffffffff81177b2c>] ? __audit_syscall_entry+0xbb/0xdf
[<ffffffff81003526>] ? do_audit_syscall_entry+0x5b/0x5d
[<ffffffff8100366d>] ? syscall_trace_enter_phase1+0xe0/0xeb
[<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[<ffffffff810eb375>] SyS_clone+0x14/0x16
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by systemd-udevd/704:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
INFO: task btserver:1838 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver D ffff88007a2a7cd8 0 1838 1663 0x00000000
ffff88007a2a7cd8 0000000000000000 0000000000000002 ffff88007e316f80
ffff88007a374900 ffff88007a2a8000 ffff88007a374900 0000000000000000
0000000000100011 00000000007d51bf ffff88007a2a7cf0 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810e9b97>] copy_process+0x5b8/0x195d
[<ffffffff810eb094>] _do_fork+0x8c/0x2fb
[<ffffffff81129ecb>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff816c9209>] ? _raw_spin_unlock_irq+0x2c/0x3f
[<ffffffff810f9339>] ? __set_current_blocked+0x44/0x49
[<ffffffff810f5cc5>] ? copy_to_user+0x32/0x38
[<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
[<ffffffff810eb375>] SyS_clone+0x14/0x16
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by btserver/1838:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
INFO: task btserver:2199 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btserver D ffff88005e29bcd8 0 2199 1652 0x00000000
ffff88005e29bcd8 0000000000000000 0000000000000000 ffff88007e216f80
ffff88000019a480 ffff88005e29c000 ffff88000019a480 0000000000000000
0000000000100011 00000000007d51bf ffff88005e29bcf0 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810e9b97>] ? copy_process+0x5b8/0x195d
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810e9b97>] copy_process+0x5b8/0x195d
[<ffffffff810eb094>] _do_fork+0x8c/0x2fb
[<ffffffff81129ecb>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff816c9209>] ? _raw_spin_unlock_irq+0x2c/0x3f
[<ffffffff810f9339>] ? __set_current_blocked+0x44/0x49
[<ffffffff810f5cc5>] ? copy_to_user+0x32/0x38
[<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
[<ffffffff810eb375>] SyS_clone+0x14/0x16
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
1 lock held by btserver/2199:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
INFO: task vsftpd:4352 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vsftpd D ffff880054867c68 0 4352 2611 0x00000000
ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
[<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
[<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
[<ffffffff81606233>] copy_net_ns+0x7b/0xf8
[<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
[<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
[<ffffffff8110759c>] copy_namespaces+0x164/0x186
[<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
[<ffffffff810eb094>] _do_fork+0x8c/0x2fb
[<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
[<ffffffff810eb375>] SyS_clone+0x14/0x16
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
2 locks held by vsftpd/4352:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>] copy_process+0x5b8/0x195d
#1: (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
INFO: task kpktgend_0:4354 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_0 D ffff88005917bce8 0 4354 2 0x00000000
ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
Call Trace:
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810f91e0>] ? exit_signals+0x17/0x103
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810f91e0>] exit_signals+0x17/0x103
[<ffffffff810ed342>] do_exit+0x105/0x9a4
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff81106df5>] kthread+0xcc/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_0/4354:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] exit_signals+0x17/0x103
INFO: task kpktgend_1:4355 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_1 D ffff88007a4e3ce8 0 4355 2 0x00000000
ffff88007a4e3ce8 00ffffffa06d5d06 ffff880000000001 ffff88007e296f80
ffff88007a350000 ffff88007a4e4000 ffff88007a350000 ffffffffa06d5d06
ffff88007a4e3ed0 0000000000000000 ffff88007a4e3d00 ffffffff816c571f
Call Trace:
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810f91e0>] ? exit_signals+0x17/0x103
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810f91e0>] exit_signals+0x17/0x103
[<ffffffff810ed342>] do_exit+0x105/0x9a4
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff81106df5>] kthread+0xcc/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_1/4355:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] exit_signals+0x17/0x103
INFO: task kpktgend_2:4356 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_2 D ffff8800549f7ce8 0 4356 2 0x00000000
ffff8800549f7ce8 00ffffffa06d5d06 ffff880000000002 ffff88007e316f80
ffff880054a52480 ffff8800549f8000 ffff880054a52480 ffffffffa06d5d06
ffff8800549f7ed0 0000000000000000 ffff8800549f7d00 ffffffff816c571f
Call Trace:
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810f91e0>] ? exit_signals+0x17/0x103
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810f91e0>] exit_signals+0x17/0x103
[<ffffffff810ed342>] do_exit+0x105/0x9a4
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff81106df5>] kthread+0xcc/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_2/4356:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] exit_signals+0x17/0x103
INFO: task kpktgend_3:4357 blocked for more than 180 seconds.
Not tainted 4.4.0-rc1+ #22
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kpktgend_3 D ffff88005e2b7ce8 0 4357 2 0x00000000
ffff88005e2b7ce8 00ffffffa06d5d06 ffff880000000003 ffff88007e396f80
ffff880054a54900 ffff88005e2b8000 ffff880054a54900 ffffffffa06d5d06
ffff88005e2b7ed0 0000000000000000 ffff88005e2b7d00 ffffffff816c571f
Call Trace:
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810f91e0>] ? exit_signals+0x17/0x103
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810f91e0>] exit_signals+0x17/0x103
[<ffffffff810ed342>] do_exit+0x105/0x9a4
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff81106df5>] kthread+0xcc/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_3/4357:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>] exit_signals+0x17/0x103

login: timed out after 60 seconds

CTRL-A Z for help | 115200 8N1 | NOR | Minicom 2.7 | VT102 | Offline | ttyUSB0


Please let me know if I can offer further help with troubleshooting this problem.