Re: [PATCH v6 5/5] nvme-fabrics: handle transient auth failures

From: Daniel Wagner
Date: Tue Apr 16 2024 - 02:57:30 EST


On Mon, Apr 15, 2024 at 11:51:19PM +0300, Sagi Grimberg wrote:
> - if (status < 0)
> > + if (status < 0) {
> > + /*
> > + * authentication errors can be transient, thus retry a couple
> > + * of times before giving up.
> > + */
> > + if (status == -EKEYREJECTED &&
> > + ++ctrl->nr_auth_retries < 3)
> > + return true;
>
> I did not suggest nr_auth_retries.

Correct. I explained why I didn't want to use nr_reconnect counter here.

> Where is the 3 coming from? The controller already
> has a number of reconnects before it gives up, no reason to add
> another one.

Sure, but seem to you consider all EKEYREJECTED errors are transient.
But this is just not correct. There is also the case where the key is
not correct on the first connect attempt for example, or the ctrl key
gets updated but not the host key.

> Just don't return false based on the status if it is a transient
> authentication error.

Not all authentication errors are transient.

> The patch just needs to be modified from
>     if (status < 0)
> to
>     if (status < 0 && status != -EKEYREJECTED Plus a comment that explains
> it.

This is not correct. This patch is not following the specification
already and I don't think we should bend the rules too much. Hence why I
limited the reconnect attempt on auth errors to 3 attempts.