Re: possible deadlock in _destroy_id

From: Jason Gunthorpe
Date: Wed Nov 18 2020 - 08:38:22 EST


On Wed, Nov 18, 2020 at 03:10:21AM -0800, syzbot wrote:

> HEAD commit: 20529233 Add linux-next specific files for 20201118
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13093cf2500000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2c4fb58b6526b3c1
> dashboard link: https://syzkaller.appspot.com/bug?extid=1bc48bf7f78253f664a9
> compiler: gcc (GCC) 10.1.0-syz 20200507
>
> Unfortunately, I don't have any reproducer for this issue yet.

Oh? Is this because the error injection is too random?

> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+1bc48bf7f78253f664a9@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> iwpm_register_pid: Unable to send a nlmsg (client = 2)
> infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98
> ============================================
> WARNING: possible recursive locking detected
> 5.10.0-rc4-next-20201118-syzkaller #0 Not tainted
> syz-executor.5/12844 is trying to acquire lock:
> ffffffff8c684748 (lock#6){+.+.}-{3:3}, at: cma_release_dev drivers/infiniband/core/cma.c:476 [inline]
> ffffffff8c684748 (lock#6){+.+.}-{3:3}, at: _destroy_id+0x299/0xa00 drivers/infiniband/core/cma.c:1852
>
> but task is already holding lock:
> ffffffff8c684748 (lock#6){+.+.}-{3:3}, at: cma_add_one+0x55c/0xce0 drivers/infiniband/core/cma.c:4902

Leon, this is caused by

commit c80a0c52d85c49a910d0dc0e342e8d8898677dc0
Author: Leon Romanovsky <leon@xxxxxxxxxx>
Date: Wed Nov 4 16:40:07 2020 +0200

RDMA/cma: Add missing error handling of listen_id

Don't silently continue if rdma_listen() fails but destroy previously
created CM_ID and return an error to the caller.

rdma_destroy_id() can't be called while holding the global lock

This is quite hard to fix. I came up with this ugly thing: