Deadlock in cleanup_net and addrconf_verify_work locks up workqueue

From: Sargun Dhillon
Date: Tue Feb 11 2020 - 14:23:36 EST


We've found a workqueue stall / deadlock. Our workload is a container-oriented
workload in which we utilize IPv6. Our container (namespace) churn is quite
frequent, and containers can be terminated before their networking is
even setup.

We're running 4.19.73 in production, and in investigation of the underlying
causes, I don't think that future versions of 4.19 fix it.

We've narrowed it down to a lockup between ipv6_addrconf, and cleanup_net.

crash> bt 8
PID: 8 TASK: ffff9a1072b50000 CPU: 24 COMMAND: "kworker/u192:0"
#0 [ffffbfe2c00fbb70] __schedule at ffffffffa7f02bf7
#1 [ffffbfe2c00fbc10] schedule at ffffffffa7f031e8
#2 [ffffbfe2c00fbc18] schedule_timeout at ffffffffa7f0700e
#3 [ffffbfe2c00fbc90] wait_for_completion at ffffffffa7f03b50
#4 [ffffbfe2c00fbce0] __flush_work at ffffffffa76a2532
#5 [ffffbfe2c00fbd58] rollback_registered_many at ffffffffa7dbcdf4
#6 [ffffbfe2c00fbdc0] unregister_netdevice_many at ffffffffa7dbd31e
#7 [ffffbfe2c00fbdd0] default_device_exit_batch at ffffffffa7dbd512
#8 [ffffbfe2c00fbe40] cleanup_net at ffffffffa7dab970
#9 [ffffbfe2c00fbe98] process_one_work at ffffffffa76a17c4
#10 [ffffbfe2c00fbed8] worker_thread at ffffffffa76a19dd
#11 [ffffbfe2c00fbf10] kthread at ffffffffa76a7fd3
#12 [ffffbfe2c00fbf50] ret_from_fork at ffffffffa80001ff

crash> bt 1369493
PID: 1369493 TASK: ffff9a03684d9600 CPU: 58 COMMAND: "kworker/58:1"
#0 [ffffbfe30d68fd48] __schedule at ffffffffa7f02bf7
#1 [ffffbfe30d68fde8] schedule at ffffffffa7f031e8
#2 [ffffbfe30d68fdf0] schedule_preempt_disabled at ffffffffa7f0349a
#3 [ffffbfe30d68fdf8] __mutex_lock at ffffffffa7f04aed
#4 [ffffbfe30d68fe90] addrconf_verify_work at ffffffffa7e8d1aa
#5 [ffffbfe30d68fe98] process_one_work at ffffffffa76a17c4
#6 [ffffbfe30d68fed8] worker_thread at ffffffffa76a19dd
#7 [ffffbfe30d68ff10] kthread at ffffffffa76a7fd3
#8 [ffffbfe30d68ff50] ret_from_fork at ffffffffa80001ff



struct -x mutex.owner.counter rtnl_mutex
owner.counter = 0xffff9a1072b50001

0xffff9a1072b50001 & (~0x07) = 0xffff9a1072b50000

This points back to PID 8 / CPU 24. It is working on cleanup_net, and a part
of cleanup net involves calling ops_exit_list, and as part of that it calls
default_device_exit_batch. default_device_exit_batch takes the rtnl lock before
calling into unregister_netdevice_many, and rollback_registered_many.
rollback_registered_many calls flush_all_backlogs. This will never complete
because it is holding the rtnl lock, and PID 1369493 / CPU 58 is waiting
for rtnl_lock.

If relevant, the workqueue stalls themselves look something like:
BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=0 stuck for 3720s!
BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=-20 stuck for 3719s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 32: cpus=16 node=0 flags=0x0 nice=0 active=2/256
in-flight: 1274779:slab_caches_to_rcu_destroy_workfn slab_caches_to_rcu_destroy_workfn
workqueue events_highpri: flags=0x10
pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
pending: flush_backlog BAR(8)
workqueue events_power_efficient: flags=0x82
pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
in-flight: 1396446:check_lifetime
workqueue mm_percpu_wq: flags=0x8
pwq 140: cpus=70 node=0 flags=0x0 nice=0 active=1/256
pending: vmstat_update
workqueue netns: flags=0xe000a
pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
in-flight: 8:cleanup_net
delayed: cleanup_net
workqueue writeback: flags=0x4e
pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
in-flight: 1334335:wb_workfn
workqueue kblockd: flags=0x18
pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
pending: blk_mq_run_work_fn
workqueue ipv6_addrconf: flags=0x40008
pwq 116: cpus=58 node=0 flags=0x0 nice=0 active=1/1
in-flight: 1369493:addrconf_verify_work
workqueue ena: flags=0xe000a
pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
in-flight: 7505:ena_fw_reset_device [ena]