Re: [PATCH v4] nvme-tcp: fix connect failure on receiving partial ICResp PDU

From: Caleb Sander
Date: Sun Feb 02 2025 - 19:19:21 EST


On Fri, Jan 31, 2025 at 12:17 AM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:
...
> Caleb, can you please make sure to test this patch with TLS?

Can you point me to some documentation for that? I tried setting up a
nvmet_tcp port to use TLS and connecting to it as described in the
cover letter for the patch series (https://lwn.net/Articles/941139/).
But the TLS Server Hello seems to fail with EACCES. Any idea what I'm
doing wrong?
$ modprobe nvmet_tcp
$ modprobe null_blk nr_devices=1
$ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet
$ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/attr_allow_any_host
$ mkdir /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1
$ echo /dev/nullb0 >
/sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/device_path
$ echo 1 > /sys/kernel/config/nvmet/subsystems/nqn.nvmet/namespaces/1/enable
$ mkdir /sys/kernel/config/nvmet/ports/1
$ echo tcp > /sys/kernel/config/nvmet/ports/1/addr_trtype
$ echo ipv4 > /sys/kernel/config/nvmet/ports/1/addr_adrfam
$ echo 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
$ echo 4420 > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
$ echo required > /sys/kernel/config/nvmet/ports/1/addr_treq
$ echo tls1.3 > /sys/kernel/config/nvmet/ports/1/addr_tsas
$ ln -s /sys/kernel/config/nvmet/subsystems/nqn.nvmet
/sys/kernel/config/nvmet/ports/1/subsystems
$ nvme gen-tls-key --subsysnqn nqn.nvmet --insert
Inserted TLS key 005e8a74
$ nvme gen-tls-key --subsysnqn nqn.2014-08.org.nvmexpress.discovery --insert
Inserted TLS key 22d676b8
$ tlshd &
$ nvme discover --transport tcp --traddr 127.0.0.1 --trsvcid 4420 --tls
Failed to write to /dev/nvme-fabrics: Input/output error
failed to add controller, error failed to write to nvme-fabrics device

With debug logs enabled, I see the following:
$ dmesg | tail -6
[ 440.405298] nvme nvme0: connecting queue 0
[ 440.405403] nvmet_tcp: queue 0: TLS ServerHello
[ 440.405433] nvme nvme0: queue 0: start TLS with key 11b456f9
[ 440.407836] nvmet_tcp: queue 0: TLS handshake done, key 0, status -13
[ 440.422881] nvme nvme0: queue 0: TLS handshake done, key 0, status -13
[ 440.422932] nvme nvme0: queue 0: TLS handshake complete, error 13

A tcpdump shows the host sending a TLS Client Hello packet and the
target immediately closing the connection.

> Do you have a reliable way to reproduce this?

Sure, here's a fake Python NVMe/TCP controller that immediately closes
each connection after receiving the ICReq PDU:
```
import socket

def recv_all(socket, length):
result = b''
while len(result) < length:
result += socket.recv(length - len(result))
return result

listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_sock.bind(('', 4420))
listen_sock.listen()
while True:
client_sock, _ = listen_sock.accept()
recv_all(client_sock, 128) # ICReq
client_sock.close()
```
Attempting to connect to it reports an error about the ICResp PDU type
field even though no ICResp was sent:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Invalid argument
could not add new controller: invalid arguments/configuration
$ dmesg | tail -1
[1351639.614853] nvme_tcp: queue 0: bad type returned 0

Here's a valid scenario where the controller sends the ICResp Common
Header and PDU Specific Header separately (using TCP_NODELAY to ensure
the sends are not coalesced):
```
import socket

def recv_all(socket, length):
result = b''
while len(result) < length:
result += socket.recv(length - len(result))
return result

def send_all(socket, data):
while data:
data = data[socket.send(data):]

listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listen_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
listen_sock.bind(('', 4420))
listen_sock.listen()
while True:
client_sock, _ = listen_sock.accept()
recv_all(client_sock, 128) # ICReq
common_header = bytes([
0x01, # PDU-type
0, # FLAGS
128, # HLEN
0, # PDO
128, 0, 0, 0, # PLEN
])
send_all(client_sock, common_header)
ic_resp = bytes([
0, 0, # PFV
0, # CPDA
0, # DGST
0xFC, 0xFF, 0xFF, 0xFF, # MAXH2CDATA
] + [0] * 112)
send_all(client_sock, ic_resp)
client_sock.close()
```
The host refuses to connect, complaining that the MAXH2CDATA field in
the ICResp is invalid. But that is because it only received the Common
Header.
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Invalid argument
could not add new controller: invalid arguments/configuration
$ dmesg | tail -1
[1351960.082011] nvme_tcp: queue 0: invalid maxh2cdata returned 0

With the patch applied, the controller closing the connection without
sending a ICResp PDU correctly results in a "Connection reset by peer"
error:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Connection reset by peer
could not add new controller: failed to write to nvme-fabrics device
$ dmesg | tail -1
[ 450.050463] nvme_tcp: queue 0: failed to receive icresp, error 0

And when the controller sends the Common Header separately from the
PDU Specific Header, nvme_tcp_init_connection() now succeeds. The
connection attempt instead times out waiting for a response to the
Fabrics Connect command, since the fake NVMe/TCP controller doesn't
implement that:
$ nvme connect --transport tcp --traddr 192.168.1.12 --nqn nqn.abc
Failed to write to /dev/nvme-fabrics: Input/output error
could not add new controller: failed to write to nvme-fabrics device
$ dmesg | tail -3
[ 644.728894] nvme nvme0: I/O tag 0 (0000) type 4 opcode 0x7f
(Connect) QID 0 timeout
[ 644.728974] nvme nvme0: Connect command failed, error wo/DNR bit: 881
[ 644.728999] nvme nvme0: failed to connect queue: 0 ret=881

Thanks,
Caleb