Re: [PATCH] nvmet-tcp: Ensure old keys are freed before replacing new ones
From: Chris Leech
Date: Thu Apr 16 2026 - 11:30:28 EST
On Thu, Apr 16, 2026 at 08:16:14AM +0200, Hannes Reinecke wrote:
> On 4/16/26 01:02, alistair23@xxxxxxxxx wrote:
> > From: Alistair Francis <alistair.francis@xxxxxxx>
> >
> > Previously after the host sends a REPLACETLSPSK we freed the TLS keys as
> > part of calling nvmet_auth_sq_free() on success. A recent change ensured
> > we don't free the keys, allowing REPLACETLSPSK to work.
> >
> > But that fix results in a kernel memory leak when running
> >
> > ```
> > nvme_trtype=loop ./check nvme/041 nvme/042 nvme/043 nvme/044 nvme/045 nvme/051 nvme/052
> > echo scan > /sys/kernel/debug/kmemleak
> > cat /sys/kernel/debug/kmemleak
> > ```
> >
> > We can't free the keys on a successful DHCHAP operation, otherwise the
> > next REPLACETLSPSK will fail, so instead let's free them before we
> > replace them as part of nvmet_auth_challenge().
> >
> > This ensures that REPLACETLSPSK works, while also avoiding any memory
> > leaks.
> >
> > Fixes: 2e6eb6b277f59 ("nvmet-tcp: Don't free SQ on authentication success")
> > Signed-off-by: Alistair Francis <alistair.francis@xxxxxxx>
> > ---
> > drivers/nvme/target/fabrics-cmd-auth.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/nvme/target/fabrics-cmd-auth.c b/drivers/nvme/target/fabrics-cmd-auth.c
> > index b9ab80c7a6941..58185184478a4 100644
> > --- a/drivers/nvme/target/fabrics-cmd-auth.c
> > +++ b/drivers/nvme/target/fabrics-cmd-auth.c
> > @@ -412,6 +412,13 @@ static int nvmet_auth_challenge(struct nvmet_req *req, void *d, int al)
> > int hash_len = nvme_auth_hmac_hash_len(ctrl->shash_id);
> > int data_size = sizeof(*d) + hash_len;
> > + /*
> > + * If replacing the keys then we have previous successful keys
> > + * that might be leaked, so we need to free them here.
> > + */
> > + if (req->sq->dhchap_c1)
> > + nvmet_auth_sq_free(req->sq);
> > +
> > if (ctrl->dh_tfm)
> > data_size += ctrl->dh_keysize;
> > if (al < data_size) {
> I am not sure.
> The authentication variables should be freed as soon as the authentication
> completes; the session key is ephemeral and
> should not be stored longer than necessary and will _never_
> be used again once authentication completes.
> The TLS key, OTOH, is used throughout the session and needs
> to be present while the session is active
> As such, both sets have vastly different lifetimes, and
> I would argue that this
>
> void nvmet_auth_sq_free(struct nvmet_sq *sq)
> {
> cancel_delayed_work(&sq->auth_expired_work);
> #ifdef CONFIG_NVME_TARGET_TCP_TLS
> sq->tls_key = NULL;
> #endif
> kfree(sq->dhchap_c1);
> sq->dhchap_c1 = NULL;
>
> is actually wrong as we should not modify 'tls_key' here.
I agree with Hannes, and was just about to respond with the same
feedback. I think the freeing of the auth temporaries needs to be
returned to fix the memleak, and the real problem is the setting of
tls_key to NULL. That doesn't seem like the right lifetime for tls_key,
and it looks to be a reference count leak as well.
Is the presence of sq->tls_key the best check to see if the socket is
currently in a kTLS mode? (it might be, I'm not as up on the target
code)
- Chris