Re: [PATCH v5 1/6] nvme: authentication error are always non-retryable

From: Hannes Reinecke
Date: Wed Apr 10 2024 - 08:06:07 EST


On 4/10/24 12:21, Sagi Grimberg wrote:


On 10/04/2024 9:52, Daniel Wagner wrote:
On Tue, Apr 09, 2024 at 11:26:00PM +0300, Sagi Grimberg wrote:

On 09/04/2024 12:35, Daniel Wagner wrote:
From: Hannes Reinecke <hare@xxxxxxx>

Any authentication errors which are generated internally are always
non-retryable, so use negative error codes to ensure they are not
retried.
The patch title says that any authentication error is not retryable, and
the patch body says "authentication errors which are generated locally
are non-retryable" so which one is it?
Forgot to update the commit message. What about:

   All authentication errors are non-retryable, so use negative error
   codes to ensure they are not retried.

?

I have a question, what happens if nvmet updated its credentials (by the admin) and in the period until the host got his credentials updated, it
happens to disconnect/reconnect. It will see an authentication
error, so it will not retry and remove the controller altogether?

Sounds like an issue to me.

Usual thing: we cannot differentiate (on the host side) whether the
current PSK is _about_ to be replaced; how should the kernel
know that the admin will replace the PSK in the next minutes?

But that really is an issue with the standard. Currently there is no
way how a target could inform the initiator that the credentials have
been updated.

We would need to define a new status code for this.
In the meantime the safe operations model is to set a lifetime
for each PSK, and ensure that the PSK is updated on both sides
during the lifetime. With that there is a timeframe during which
both PSKs are available (on the target), and the older will expire
automatically once the lifetime limit is reached.

Cheers,

Hannes