Hi,
We found nvme connect will dealock when it cannot alloc tag in admin queue. So we reproduce it and find a way to work around. The solution is to utilize reserve tag for connecting.
Here is the deadlock environment:
1. the process [kworker/u129:1+nvme-wq] want to connect wait for geting tag, but tag is used up:
[<0>] blk_mq_get_tag+0x11d/0x2d0
[<0>] __blk_mq_alloc_request+0x92/0x180
[<0>] blk_mq_alloc_request+0x7c/0xc0
[<0>] nvme_alloc_request+0x28/0x100 [nvme_core]
[<0>] __nvme_submit_sync_cmd+0x1ea/0x230 [nvme_core]
[<0>] nvmf_reg_read64+0x62/0xa0 [nvme_fabrics]
[<0>] nvme_enable_ctrl+0x25/0xb0 [nvme_core]
[<0>] nvme_tcp_setup_ctrl+0x257/0x340 [nvme_tcp]
[<0>] nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
[<0>] process_one_work+0x228/0x3d0
[<0>] worker_thread+0x4d/0x3f0
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x1f/0x30
2. many processes (here is nvme list) is waiting for connecting:
[<0>] blk_execute_rq+0x8d/0x110
[<0>] nvme_execute_passthru_rq+0x60/0x1f0 [nvme_core]
[<0>] nvme_submit_user_cmd+0x23e/0x400 [nvme_core]
[<0>] nvme_user_cmd+0x163/0x1d0 [nvme_core]
[<0>] nvme_ctrl_ioctl+0x2e/0x40 [nvme_core]
[<0>] __nvme_ioctl+0x78/0xc0 [nvme_core]
[<0>] nvme_ioctl+0x1e/0x20 [nvme_core]
[<0>] blkdev_ioctl+0x126/0x260
[<0>] block_ioctl+0x4a/0x60
[<0>] __x64_sys_ioctl+0x91/0xc0
[<0>] do_syscall_64+0x59/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Reproduce method is very eazy:
1. call many nvme list
2. make nvme io timeout to recover connection
3. trick is to make reconnect-delay much time, eg:30s
The solution is the appending patch. it is tested and also consider keepalive and reset/showdown tag reserve.