Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones

From: Alistair Francis

Date: Thu Apr 16 2026 - 20:50:43 EST


On Fri, Apr 17, 2026 at 1:27 AM Chris Leech <cleech@xxxxxxxxxx> wrote:
>
> On Thu, Apr 16, 2026 at 08:16:14AM +0200, Hannes Reinecke wrote:
> > On 4/16/26 01:02, alistair23@xxxxxxxxx wrote:
> > > From: Alistair Francis <alistair.francis@xxxxxxx>
> > >
> > > Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
> > > part of calling nvmet_auth_sq_free() on success. A recent change ensured
> > > we don't free the keys, allowing REPLACETLSPSK to work.
> > >
> > > But that fix results in a kernel memory leak when running
> > >
> > > ```
> > > nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
> > > echo scan > /sys/kernel/debug/kmemleak
> > > cat /sys/kernel/debug/kmemleak
> > > ```
> > >
> > > We can't free the keys on a successful DHCHAP operation, otherwise the
> > > next REPLACETLSPSK will fail, so instead let's free them before we
> > > replace them as part of nvmet_auth_challenge().
> > >
> > > This ensures that REPLACETLSPSK works, while also avoiding any memory
> > > leaks.
> > >
> > > Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
> > > Signed-off-by: Alistair Francis <alistair.francis@xxxxxxx>
> > > ---
> > > drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
> > > 1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
> > > index b9ab80c7a6941..58185184478a4 100644
> > > --- a/drivers/nvme/target/fabrics-cmd-auth.c
> > > +++ b/drivers/nvme/target/fabrics-cmd-auth.c
> > > @@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
> > > int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
> > > int data_size = sizeof(*d) + hash_len;
> > > + /*
> > > + * If replacing the keys then we have previous successful keys
> > > + * that might be leaked, so we need to free them here.
> > > + */
> > > + if (req->sq->dhchap_c1)
> > > + nvmet_auth_sq_free(req->sq);
> > > +
> > > if (ctrl->dh_tfm)
> > > data_size += ctrl->dh_keysize;
> > > if (al < data_size) {
> > I am not sure.
> > The authentication variables should be freed as soon as the authentication
> > completes; the session key is ephemeral and
> > should not be stored longer than necessary and will _never_
> > be used again once authentication completes.
> > The TLS key, OTOH, is used throughout the session and needs
> > to be present while the session is active
> > As such, both sets have vastly different lifetimes, and
> > I would argue that this
> >
> > void nvmet_auth_sq_free(struct nvmet_sq *sq)
> > {
> > cancel_delayed_work(&sq->auth_expired_work);
> > #ifdef CONFIG_NVME_TARGET_TCP_TLS
> > sq->tls_key = NULL;
> > #endif
> > kfree(sq->dhchap_c1);
> > sq->dhchap_c1 = NULL;
> >
> > is actually wrong as we should not modify 'tls_key' here.
>
> I agree with Hannes, and was just about to respond with the same
> feedback. I think the freeing of the auth temporaries needs to be
> returned to fix the memleak, and the real problem is the setting of
> tls_key to NULL. That doesn't seem like the right lifetime for tls_key,
> and it looks to be a reference count leak as well.

Yep, agreed. Patches sent to revert the free and remove the
`sq->tls_key = NULL;`

>
> Is the presence of sq->tls_key the best check to see if the socket is
> currently in a kTLS mode? (it might be, I'm not as up on the target
> code)

I think it is correct, the issue is just that we are setting sq->tls_key to NULL

Alistair