Re: [PATCH RDMA v2] RDMA/rxe: add mutual exclusion in rxe_net_del()
From: Zhu Yanjun
Date: Sun May 17 2026 - 00:32:21 EST
在 2026/5/16 20:27, Zhu Yanjun 写道:
在 2026/5/16 19:15, Kuniyuki Iwashima 写道:
On Sat, May 16, 2026 at 4:40 PM Yanjun.Zhu <yanjun.zhu@xxxxxxxxx> wrote:
-1 for this.
On 5/16/26 7:00 AM, Edward Adam Davis wrote:
We must serialize calls to rxe_net_del() or risk a crash as syzbotI read this commit carefully. There are two paths that can invoke
reported:
KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
Call Trace:
udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197
rxe_release_udp_tunnel drivers/infiniband/sw/rxe/rxe_net.c:294 [inline]
rxe_sock_put drivers/infiniband/sw/rxe/rxe_net.c:639 [inline]
rxe_net_del+0xfb/0x290 drivers/infiniband/sw/rxe/rxe_net.c:660
rxe_dellink+0x15/0x20 drivers/infiniband/sw/rxe/rxe.c:254
Jason Gunthorpe suggest placing the lock within rxe to protect its racy
implementation of rxe_net_del(), which looks like it is possibly also
triggered by NETDEV_UNREGISTER.
The patch addressing this issue in nldev_dellink() has already been
applied(0b28000b64f4); however, since the fix has now been relocated
to rxe, the corresponding remedial code in nldev has been removed.
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Fixes: 0b28000b64f4 ("RDMA/nldev: Add mutual exclusion in nldev_dellink()")
Reported-by: syzbot+d8f76778263ab65c2b21@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Signed-off-by: Edward Adam Davis <eadavis@xxxxxx>
---
v1 -> v2: serialize calls to rxe net del
drivers/infiniband/core/nldev.c | 4 ----
drivers/infiniband/sw/rxe/rxe_net.c | 7 ++++++-
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/ core/nldev.c
index 3cb3cb7629fe..96c745d5bac4 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -1816,8 +1816,6 @@ static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}
-static DEFINE_MUTEX(nldev_dellink_mutex);
-
static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
@@ -1848,9 +1846,7 @@ static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
* implicitly scoped to the driver supporting dynamic link deletion like RXE.
*/
if (device->link_ops && device->link_ops->dellink) {
- mutex_lock(&nldev_dellink_mutex);
err = device->link_ops->dellink(device);
- mutex_unlock(&nldev_dellink_mutex);
if (err)
return err;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/ infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..92847e955ca2 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -642,6 +642,8 @@ static void rxe_sock_put(struct sock *sk,
}
}
rxe_net_del().
One is through the rdma link del xxx command, while the other is through
the netdevice notification chain.
In the netdevice notification chain path, rtnl_lock is already held, and
rxe_net_del() is called under that lock.
However, in the rdma link del xxx path, no rtnl_lock is taken.
Because of this, I would like to use the existing rtnl_lock to serialize
calls to rxe_net_del().
It's a global mutex and heavily contended because many
components use it without much care. We are working
to reduce the RTNL pressure for years by converting such
users with a dedicated lock or per-netns RTNL mutex.
RTNL is not needed here at all, so please use a dedicated lock.
Thanks a lot for your review. I think the following commit can fix this problem.
Please review.
The root cause is clear. If no one disagrees with this commit, I will send out the official patch.
In the latest revision, I will move the mutex lock into the network namespace.
I think we have discussed this problem thoroughly, and we all understand the root cause now.
Zhu Yanjun
From 80525f5b7fb0af18b9759cbde0237aabb76158cc Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Date: Sat, 16 May 2026 22:27:35 +0200
Subject: [PATCH 1/1] RDMA/rxe: Fix Use-After-Free problem in rxe_net_del
syzbot reported a general protection fault (KASAN: null-ptr-deref) in
kernel_sock_shutdown() called during the software RoCE (rxe) link
deletion path (rxe_dellink -> rxe_net_del).
The root cause is a TOCTOU (Time-of-Check to Time-of-Use) race condition
in rxe_net_del(). Previously, the function fetched the socket pointer
via rxe_ns_pernet_sk4/6() outside the critical section, and then
acquired the lock to release it via rxe_sock_put().
In a highly concurrent teardown environment, another thread could close
and clear the pernet socket after it was fetched but before the lock
was acquired. This causes rxe_sock_put() to operate on a dangling or
already cleared socket pointer, leading to a NULL pointer dereference
when kernel_sock_shutdown() attempts to access sock->sk.
Fix this by introducing a dedicated, per-device mutex 'release_lock'
and extending its scope. The socket pointers are now fetched, checked,
and released entirely within the same locked critical section. This
ensures the atomicity of the socket lookup and teardown sequence.
Reported-by: syzbot+d8f76778263ab65c2b21@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=d8f76778263ab65c2b21
Fixes: f1327abd6abe ("RDMA/rxe: Support RDMA link creation and destruction per net namespace")
Signed-off-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
---
drivers/infiniband/sw/rxe/rxe.c | 2 ++
drivers/infiniband/sw/rxe/rxe_net.c | 4 ++++
drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
3 files changed, 7 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/ rxe/rxe.c
index b0714f9abe3d..46967ecdaf7d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -34,6 +34,7 @@ void rxe_dealloc(struct ib_device *ib_dev)
WARN_ON(!RB_EMPTY_ROOT(&rxe->mcg_tree));
mutex_destroy(&rxe->usdev_lock);
+ mutex_destroy(&rxe->release_lock);
}
static const struct ib_device_ops rxe_ib_dev_odp_ops = {
@@ -186,6 +187,7 @@ static void rxe_init(struct rxe_dev *rxe, struct net_device *ndev)
rxe->mcg_tree = RB_ROOT;
mutex_init(&rxe->usdev_lock);
+ mutex_init(&rxe->release_lock);
}
void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/ sw/rxe/rxe_net.c
index 50a2cb5405e2..c3b188538540 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -655,6 +655,8 @@ void rxe_net_del(struct ib_device *dev)
net = dev_net(ndev);
+ mutex_lock(&rxe->release_lock);
+
sk = rxe_ns_pernet_sk4(net);
if (sk)
rxe_sock_put(sk, rxe_ns_pernet_set_sk4, net);
@@ -663,6 +665,8 @@ void rxe_net_del(struct ib_device *dev)
if (sk)
rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
+ mutex_unlock(&rxe->release_lock);
+
dev_put(ndev);
}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/ sw/rxe/rxe_verbs.h
index d92f80d16f78..3f54aa0a4356 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -422,6 +422,7 @@ struct rxe_dev {
int max_ucontext;
int max_inline_data;
struct mutex usdev_lock;
+ struct mutex release_lock;
char raw_gid[ETH_ALEN];
--
2.43.0
My proposed commit is shown below. I am not sure whether it fully
resolves the problem.
diff --git a/drivers/infiniband/sw/rxe/rxe.c
b/drivers/infiniband/sw/rxe/rxe.c
index b0714f9abe3d..84266dc416c4 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -251,7 +251,9 @@ static int rxe_newlink(const char *ibdev_name,
struct net_device *ndev)
static int rxe_dellink(struct ib_device *dev)
{
+ rtnl_lock();
rxe_net_del(dev);
+ rtnl_unlock();
return 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c
b/drivers/infiniband/sw/rxe/rxe_net.c
index 50a2cb5405e2..ac53ea73996d 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -649,6 +649,8 @@ void rxe_net_del(struct ib_device *dev)
struct sock *sk;
struct net *net;
+ ASSERT_RTNL();
+
ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
if (!ndev)
return;
Zhu Yanjun
+static DEFINE_MUTEX(rxe_net_del_mutex);
+
void rxe_net_del(struct ib_device *dev)
{
struct rxe_dev *rxe = container_of(dev, struct rxe_dev, ib_dev);
@@ -649,9 +651,10 @@ void rxe_net_del(struct ib_device *dev)
struct sock *sk;
struct net *net;
+ mutex_lock(&rxe_net_del_mutex);
ndev = rxe_ib_device_get_netdev(&rxe->ib_dev);
if (!ndev)
- return;
+ goto out;
net = dev_net(ndev);
@@ -664,6 +667,8 @@ void rxe_net_del(struct ib_device *dev)
rxe_sock_put(sk, rxe_ns_pernet_set_sk6, net);
dev_put(ndev);
+out:
+ mutex_unlock(&rxe_net_del_mutex);
}
static void rxe_port_event(struct rxe_dev *rxe,