Re: Deadlock in cleanup_net and addrconf_verify_work locks up workqueue
From: Eric Dumazet
Date: Wed Feb 12 2020 - 00:10:40 EST
On 2/11/20 11:23 AM, Sargun Dhillon wrote:
> We've found a workqueue stall / deadlock. Our workload is a container-oriented
> workload in which we utilize IPv6. Our container (namespace) churn is quite
> frequent, and containers can be terminated before their networking is
> even setup.
>
> We're running 4.19.73 in production, and in investigation of the underlying
> causes, I don't think that future versions of 4.19 fix it.
>
> We've narrowed it down to a lockup between ipv6_addrconf, and cleanup_net.
Sure, PID 1369493 addrconf_verify_work() is waiting for RTNL.
But PID 8 ?
__flush_work() is being called.
But from where ? Stacks seem not complete.
>
> crash> bt 8
> PID: 8 TASK: ffff9a1072b50000 CPU: 24 COMMAND: "kworker/u192:0"
> #0 [ffffbfe2c00fbb70] __schedule at ffffffffa7f02bf7
> #1 [ffffbfe2c00fbc10] schedule at ffffffffa7f031e8
> #2 [ffffbfe2c00fbc18] schedule_timeout at ffffffffa7f0700e
> #3 [ffffbfe2c00fbc90] wait_for_completion at ffffffffa7f03b50
> #4 [ffffbfe2c00fbce0] __flush_work at ffffffffa76a2532
> #5 [ffffbfe2c00fbd58] rollback_registered_many at ffffffffa7dbcdf4
> #6 [ffffbfe2c00fbdc0] unregister_netdevice_many at ffffffffa7dbd31e
> #7 [ffffbfe2c00fbdd0] default_device_exit_batch at ffffffffa7dbd512
> #8 [ffffbfe2c00fbe40] cleanup_net at ffffffffa7dab970
> #9 [ffffbfe2c00fbe98] process_one_work at ffffffffa76a17c4
> #10 [ffffbfe2c00fbed8] worker_thread at ffffffffa76a19dd
> #11 [ffffbfe2c00fbf10] kthread at ffffffffa76a7fd3
> #12 [ffffbfe2c00fbf50] ret_from_fork at ffffffffa80001ff
>
> crash> bt 1369493
> PID: 1369493 TASK: ffff9a03684d9600 CPU: 58 COMMAND: "kworker/58:1"
> #0 [ffffbfe30d68fd48] __schedule at ffffffffa7f02bf7
> #1 [ffffbfe30d68fde8] schedule at ffffffffa7f031e8
> #2 [ffffbfe30d68fdf0] schedule_preempt_disabled at ffffffffa7f0349a
> #3 [ffffbfe30d68fdf8] __mutex_lock at ffffffffa7f04aed
> #4 [ffffbfe30d68fe90] addrconf_verify_work at ffffffffa7e8d1aa
> #5 [ffffbfe30d68fe98] process_one_work at ffffffffa76a17c4
> #6 [ffffbfe30d68fed8] worker_thread at ffffffffa76a19dd
> #7 [ffffbfe30d68ff10] kthread at ffffffffa76a7fd3
> #8 [ffffbfe30d68ff50] ret_from_fork at ffffffffa80001ff
>
>
>
> struct -x mutex.owner.counter rtnl_mutex
> owner.counter = 0xffff9a1072b50001
>
> 0xffff9a1072b50001 & (~0x07) = 0xffff9a1072b50000
>
> This points back to PID 8 / CPU 24. It is working on cleanup_net, and a part
> of cleanup net involves calling ops_exit_list, and as part of that it calls
> default_device_exit_batch. default_device_exit_batch takes the rtnl lock before
> calling into unregister_netdevice_many, and rollback_registered_many.
> rollback_registered_many calls flush_all_backlogs. This will never complete
> because it is holding the rtnl lock, and PID 1369493 / CPU 58 is waiting
> for rtnl_lock.
But PID 1369493 is waiting on a mutex, thus properly yielding the cpu.
(schedule() is clearly shown)
This should not prevent other threads
from making progress so that flush_all_backlogs() completes eventually.
flush_all_backlogs() does not care of how many threads are currently blocked
because they can not grab rtnl while flush_all_backlogs() is running.
>
> If relevant, the workqueue stalls themselves look something like:
> BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=0 stuck for 3720s!
> BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=-20 stuck for 3719s!
> Showing busy workqueues and worker pools:
> workqueue events: flags=0x0
> pwq 32: cpus=16 node=0 flags=0x0 nice=0 active=2/256
> in-flight: 1274779:slab_caches_to_rcu_destroy_workfn slab_caches_to_rcu_destroy_workfn
> workqueue events_highpri: flags=0x10
> pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
> pending: flush_backlog BAR(8)
> workqueue events_power_efficient: flags=0x82
> pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
> in-flight: 1396446:check_lifetime
> workqueue mm_percpu_wq: flags=0x8
> pwq 140: cpus=70 node=0 flags=0x0 nice=0 active=1/256
> pending: vmstat_update
> workqueue netns: flags=0xe000a
> pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
> in-flight: 8:cleanup_net
> delayed: cleanup_net
> workqueue writeback: flags=0x4e
> pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
> in-flight: 1334335:wb_workfn
> workqueue kblockd: flags=0x18
> pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
> pending: blk_mq_run_work_fn
> workqueue ipv6_addrconf: flags=0x40008
> pwq 116: cpus=58 node=0 flags=0x0 nice=0 active=1/1
> in-flight: 1369493:addrconf_verify_work
> workqueue ena: flags=0xe000a
> pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
> in-flight: 7505:ena_fw_reset_device [ena]
>