On Tue, Jan 28, 2025 at 12:28 AM Hannes Reinecke <hare@xxxxxxx> wrote:
On 1/27/25 18:38, Caleb Sander wrote:I am not seeing where sk_rcvtimeo is ignored when MSG_WAITALL is used,
On Sun, Jan 26, 2025 at 11:37 PM Hannes Reinecke <hare@xxxxxxx> wrote:Hmm. With checking the code 'rcvtimeo' is only evaluated if MSG_WAITALL
On 1/24/25 19:43, Caleb Sander Mateos wrote:Yes, if the NVMe/TCP controller sends less than 128 bytes, we need to
nvme_tcp_init_connection() attempts to receive an ICResp PDU but onlyBut won't we have to wait for a TCP timeout now if the sender sends less
checks that the return value from recvmsg() is non-negative. If the
sender closes the TCP connection or sends fewer than 128 bytes, this
check will pass even though the full PDU wasn't received.
Ensure the full ICResp PDU is received by checking that recvmsg()
returns the expected 128 bytes.
Additionally set the MSG_WAITALL flag for recvmsg(), as a sender could
split the ICResp over multiple TCP frames. Without MSG_WAITALL,
recvmsg() could return prematurely with only part of the PDU.
Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx>
Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
---
v4: keep recvmsg() error return value
v3: fix return value to indicate error
v2: add Fixes tag
drivers/nvme/host/tcp.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e9ff6babc540..56679eb8c0d6 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1446,15 +1446,18 @@ static int nvme_tcp_init_connection(struct nvme_tcp_queue *queue)
iov.iov_len = sizeof(*icresp);
if (nvme_tcp_queue_tls(queue)) {
msg.msg_control = cbuf;
msg.msg_controllen = sizeof(cbuf);
}
+ msg.msg_flags = MSG_WAITALL;
ret = kernel_recvmsg(queue->sock, &msg, &iov, 1,
iov.iov_len, msg.msg_flags);
than 128 bytes? With this patch we always wait for 128 bytes, and
possibly wait for TCP timeout if not.
wait for it to send the remainder of the ICResp PDU. That's just how
the NVMe/TCP protocol works. If we want to protect against
buggy/malicious controllers that don't send a full ICResp, we need a
timeout mechanism. That's the purpose of the existing
`queue->sock->sk->sk_rcvtimeo = 10 * HZ;` in nvme_tcp_alloc_queue().
Note that recvmsg() timing out was already possible in the original
code if the controller didn't send anything on the TCP connection
after accepting it.
is _not_ set. Makes me wonder why we do set it...
But that's beside the point.
can you point me to the code?
It is certainly ignored for *non-blocking* recvmsg() (MSG_DONTWAIT),
but I don't see why it would be ignored for MSG_WAITALL. man 7 socket
also suggests it applies to all blocking socket I/O.
static inline long sock_rcvtimeo(const struct sock *sk, bool noblock)
{
return noblock ? 0 : sk->sk_rcvtimeo;
}
In tcp_recvmsg_locked():
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
The driver uses non-blocking socket I/O in the I/O path, but not theNo, the host doesn't need to know. TLS is enabled by the lowerTestcase for this would be nice ...Are you saying that with secure concatenation we don't know in advance
And I need to check if secure concatenation is affected here; with
secure concatenation we need to peek at the first packet to check
if it's an ICRESP or a TLS negotiation.
whether the connection is using TLS between the TCP and NVMe/TCP
protocol layers? Wouldn't the host already need to know that when it
sent its ICReq PDU?
layers.
But upon further checking, I guess it'll be okay with secure
concatenation.
Nevertheless, I would vastly prefer to have a receive loop here
instead of waiting to receive the full amount as per MSG_WAITALL.
The entire tcp code is using nonblocking calls, so I'd rather
keep it that way and implement a receive loop here.
connect path. nvme_tcp_init_connection() is already using blocking
socket I/O to send the ICReq PDU and receive the ICResp. I suppose we
could change the connect path to register poll interest and use
non-blocking operations, but that seems like a more involved code
change and orthogonal to the issue of receiving the full ICResp.