Re: [PATCH] IB/core: Fix ABBA deadlock in rdma_dev_exit_net

From: wujing

Date: Tue Dec 16 2025 - 05:00:42 EST


Hi Jason,

You're right that the locks aren't nested in rdma_dev_exit_net() - it does release
rdma_nets_rwsem before acquiring devices_rwsem. However, this is still an ABBA deadlock,
just not the trivial nested kind. The issue is caused by **rwsem writer priority**
and lock ordering inconsistency.

Here's the actual deadlock scenario:

**Thread A (rdma_dev_exit_net - cleanup_net workqueue):**
```
down_write(&rdma_nets_rwsem); // Acquired
xa_store(&rdma_nets, ...);
up_write(&rdma_nets_rwsem); // Released
down_read(&devices_rwsem); // Waiting here <-- BLOCKED
```

**Thread B (rdma_dev_init_net - stress-ng-clone):**
```
down_read(&devices_rwsem); // Acquired
down_read(&rdma_nets_rwsem); // Waiting here <-- BLOCKED
```

The deadlock happens because:

1. Thread A releases rdma_nets_rwsem as a **writer**
2. Thread B (and many others) are waiting to acquire rdma_nets_rwsem as **readers**
3. Thread A then tries to acquire devices_rwsem as a reader
4. BUT: rwsem gives priority to pending writers over new readers
5. Since Thread A was a pending writer on rdma_nets_rwsem, Thread B's read request is blocked
6. Thread B holds devices_rwsem, which Thread A needs
7. Thread A holds the "writer priority slot" on rdma_nets_rwsem, which Thread B needs

This is a **priority inversion deadlock**, not a simple nested lock deadlock.

The production crash log shows exactly this:
- Thread A: `rdma_dev_exit_net+0x60` stuck in `rwsem_down_write_slowpath` trying to get devices_rwsem
- Thread B: `rdma_dev_init_net+0x120` stuck in `rwsem_down_read_slowpath` trying to get rdma_nets_rwsem

Lockdep doesn't catch this because:
1. The locks aren't held simultaneously (no nested locking)
2. It's a reader-writer priority issue, not a simple lock ordering issue
3. It requires specific timing: writer releases lock, then tries to acquire another
lock that readers (waiting for the first lock) already hold

The fix ensures both paths acquire locks in the same order:
- rdma_dev_init_net: devices_rwsem → rdma_nets_rwsem
- rdma_dev_exit_net: devices_rwsem → rdma_nets_rwsem (was reversed)

This eliminates the priority inversion scenario.

Best regards