[PATCH v2] nvme-tcp: Fix netns UAF introduced by commit 1be52169c348

From: shaopeijie
Date: Thu Apr 03 2025 - 10:54:34 EST


From: Peijie Shao <shaopeijie@xxxxxxxx>

The patch is for nvme-tcp host side.

commit 1be52169c348
("nvme-tcp: fix selinux denied when calling sock_sendmsg")
uses sock_create_kern instead of sock_create to solve SELinux
problem, however sock_create_kern does not take a reference of
the given netns, which results in a use-after-free when the
non-init_net netns is destroyed before sock_release.

For example: a container not share with host's network namespace
doing a 'nvme connect', and is stopped without 'nvme disconnect'.

The patch changes parameter current->nsproxy->net_ns to init_net,
makes the socket always belongs to the host. It also naturally
avoids changing sock's netns from previous creator's netns to
init_net when sock is re-created by nvme recovery path
(workqueue is in init_net namespace).

Signed-off-by: Peijie Shao <shaopeijie@xxxxxxxx>
---

Changes in v2:
1. Fix style problems reviewed by Christoph Hellwig, thanks!
2. Add 'nvme-tcp:' prefix for the patch.

Version v1:
Hi all,
This is the v1 patch. Before this version, I tried to
get_net(current->nsproxy->net_ns) in nvme_tcp_alloc_queue() to
fix the issue, but failed to find a suitable placeto do
put_net(). Because the socket is released by fput() internally.
I think code like below:
nvme_tcp_free_queue() {
fput()
put_net()
}
can not ensure the socket was released before put_net, since
someone is still holding the file.

So I would like to use the 'init_net' net namespace.

---
drivers/nvme/host/tcp.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 26c459f0198d..9b1d0ad18b77 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1789,8 +1789,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid,
queue->cmnd_capsule_len = sizeof(struct nvme_command) +
NVME_TCP_ADMIN_CCSZ;

- ret = sock_create_kern(current->nsproxy->net_ns,
- ctrl->addr.ss_family, SOCK_STREAM,
+ /*
+ * sock_create_kern() does not take a reference to
+ * current->nsproxy->net_ns, use init_net instead.
+ * This also avoid changing sock's netns from previous
+ * creator's netns to init_net when sock is re-created
+ * by nvme recovery path.
+ */
+ ret = sock_create_kern(&init_net, ctrl->addr.ss_family, SOCK_STREAM,
IPPROTO_TCP, &queue->sock);
if (ret) {
dev_err(nctrl->device,
--
2.43.0