[PATCH v4 0/5] nvme-fabrics: short-circuit connect retries

From: Daniel Wagner
Date: Thu Apr 04 2024 - 11:45:29 EST


An rebase of Hannes two series which fix the TCP and RDMA transport to handle
the DNR bit on connect attempts.

For testing I extended the nvme/045 test case. I'll update the test case later
when the current batch of blktest changes are done. Also this change depends on
the extension of the debugfs interface of nvmet, which is also not yet merged.

echo "Renew host key on the controller and force reconnect"

new_hostkey="$(nvme gen-dhchap-key -n ${def_subsysnqn} 2> /dev/null)"

_set_nvmet_hostkey "${def_hostnqn}" "${new_hostkey}"

# Force a reconnect
nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
cntlid="$(nvme id-ctrl "/dev/${nvmedev}" | grep cntlid | awk '{print $3}')"
echo "fatal" > /sys/kernel/debug/nvmet/"${def_subsysnqn}/ctrl$((${cntlid}))"/state
nvmf_wait_for_ctrl_delete "${nvmedev}"


baseline:

run 1 loop (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 2.690s ... 2.777s
run 1 tcp (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [failed]
runtime 2.777s ... 8.030s
--- tests/nvme/045.out 2024-04-04 16:14:22.547250311 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/045.out.bad 2024-04-04 17:29:03.427799336 +0200
@@ -9,5 +9,6 @@
Change hash to hmac(sha512)
Re-authenticate with changed hash
Renew host key on the controller and force reconnect
-disconnected 0 controller(s)
+controller "nvme2" not deleted within 5 seconds
+disconnected 1 controller(s)
Test complete
run 1 rdma (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [failed]
runtime 8.030s ... 9.632s
--- tests/nvme/045.out 2024-04-04 16:14:22.547250311 +0200
+++ /home/wagi/work/blktests/results/nodev/nvme/045.out.bad 2024-04-04 17:29:15.017745115 +0200
@@ -9,5 +9,6 @@
Change hash to hmac(sha512)
Re-authenticate with changed hash
Renew host key on the controller and force reconnect
-disconnected 0 controller(s)
+controller "nvme2" not deleted within 5 seconds
+disconnected 1 controller(s)
Test complete
run 1 fc (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 9.632s ... 3.588s


patched:

run 1 loop (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 6.816s ... 2.492s
run 1 tcp (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 2.492s ... 3.663s
run 1 rdma (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 3.663s ... 3.795s
run 1 fc (nvmet_blkdev_type file)
nvme/045 (Test re-authentication) [passed]
runtime 3.795s ... 2.690s



changes:
v4:
- rebased
- added 'nvme: fixes for authentication errors' series
https://lore.kernel.org/linux-nvme/20240301112823.132570-1-hare@xxxxxxxxxx/

v3:
- added my SOB tag
- fixed indention
- https://lore.kernel.org/linux-nvme/20240305080005.3638-1-dwagner@xxxxxxx/

v2:
- refresh/rebase on current head
- extended blktests (nvme/045) to cover this case
(see separate post)
- https://lore.kernel.org/linux-nvme/20240304161006.19328-1-dwagner@xxxxxxx/

v1:
- initial version
- https://lore.kernel.org/linux-nvme/20210623143250.82445-1-hare@xxxxxxx/


*** BLURB HERE ***

Hannes Reinecke (5):
nvme: authentication error are always non-retryable
nvmet: lock config semaphore when accessing DH-HMAC-CHAP key
nvmet: return DHCHAP status codes from nvmet_setup_auth()
nvme-tcp: short-circuit reconnect retries
nvme-rdma: short-circuit reconnect retries

drivers/nvme/host/core.c | 6 +++---
drivers/nvme/host/fabrics.c | 29 +++++++++++++++-----------
drivers/nvme/host/nvme.h | 19 ++++++++++++++++-
drivers/nvme/host/rdma.c | 22 ++++++++++++-------
drivers/nvme/host/tcp.c | 23 +++++++++++++-------
drivers/nvme/target/auth.c | 20 ++++++++----------
drivers/nvme/target/configfs.c | 22 ++++++++++++++-----
drivers/nvme/target/fabrics-cmd-auth.c | 11 +++++-----
8 files changed, 100 insertions(+), 52 deletions(-)

--
2.44.0