Re: nvme-tcp: fix a possible UAF when failing to send request

From: Maurizio Lombardi
Date: Wed Feb 12 2025 - 03:53:51 EST


On Wed Feb 12, 2025 at 9:11 AM CET, Maurizio Lombardi wrote:
> On Tue Feb 11, 2025 at 9:04 AM CET, zhang.guanghui@xxxxxxxx wrote:
>> Hi 
>>
>>     This is a  race issue,  I can't reproduce it stably yet. I have not tested the latest kernel.  but in fact,  I've synced some nvme-tcp patches from  lastest upstream,
>
> Hello, could you try this patch?
>
> queue_lock should protect against concurrent "error recovery",
> + mutex_lock(&queue->queue_lock);

Unfortunately I've just realized that queue_lock won't save us
from the race against the controller reset, it's still possible
we lock a destroyed mutex. So just try this
simplified patch, I will try to figure out something else:

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 841238f38fdd..b714e1691c30 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2660,7 +2660,10 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
set_bit(NVME_TCP_Q_POLLING, &queue->flags);
if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue))
sk_busy_loop(sk, true);
+
+ mutex_lock(&queue->send_mutex);
nvme_tcp_try_recv(queue);
+ mutex_unlock(&queue->send_mutex);
clear_bit(NVME_TCP_Q_POLLING, &queue->flags);
return queue->nr_cqe;
}

Maurizio