Re: [Qestion] Lots of memory leaks when mounting and unmounting nfs client to server continuously.

From: Dave Wysochanski
Date: Wed Nov 07 2018 - 14:51:03 EST


On Tue, 2018-10-30 at 21:58 +0800, zhong jiang wrote:
> On 2018/10/30 21:06, Benjamin Coddington wrote:
> > Hi zhong jiang,
> >
> > Try asking in linux-nfs.. but I'll also note that 3.10-stable may
> > be missing a number of fixes to leaks in the NFS GSS code.
> >
> > I can see a more than a few fixes to memory leaks with:
> > git log --grep=leak --oneline net/sunrpc/auth_gss/
> >
>
> Thanks for your reply.ÂÂI has tested some of them in the upsteam as
> you have said.ÂÂbut It fails to solve the issue completely.
> hence, I turn to the relevant experts whether they have happened to
> the issue orÂÂcan give some suggestion or not.
>
> Thanks,
> zhong jiang
> > Ben
> >
> > On 30 Oct 2018, at 8:45, zhong jiang wrote:
> >
> > > Hi,ÂÂÂHerbert
> > >
> > > Recently,ÂÂIÂÂhitÂÂa memory leak issue whenÂÂmounting and
> > > unmounting nfs withÂÂthe way ofÂÂkrb5.
> > > The issue happens to the linux-3.10-stable.
> > >
> > > I find that slab-1024 and slab-512 will take up most of the
> > > memory.ÂÂAnd it can not be freed.
> > > Meanwhile, it result in rpcsec_gss_krb5 can be unregistered as
> > > well.
> > >
> > >

Are you running the latest 3.10-stable?

This sounds very familiar to something I encountered a while ago and it
was a sunrpc cache related problem. The patch that fixed it for me is
in 3.10.106 though.

Can you check if this cache is growing indefinitely?
/proc/net/rpc/auth.rpcsec.context

If it is large, try to flush explicitly with:
date +%sÂÂ> /proc/net/rpc/auth.rpcsec.context/flush

If all that checks out, you may need the below upstream fix, but it
went into v3.10.106 as
6a4a5fd svcrpc: don't leak contexts on PROC_DESTROY

commit 6a4a5fd4c7bc6a06ca26ad7327d046d8d3c0932a
Author: J. Bruce Fields <bfields@xxxxxxxxxx>
Date: Mon Jan 9 17:15:18 2017 -0500

svcrpc: don't leak contexts on PROC_DESTROY

commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream.

Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future. This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@xxxxxxxxxx>
Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
Signed-off-by: Willy Tarreau <w@xxxxxx>

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 62663a0..e625efe 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1518,7 +1518,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {}
case RPC_GSS_PROC_DESTROY:
if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq))
goto auth_err;
- rsci->h.expiry_time = get_seconds();
+ rsci->h.expiry_time = seconds_since_boot();
set_bit(CACHE_NEGATIVE, &rsci->h.flags);
if (resv->iov_len + 4 > PAGE_SIZE)
goto drop;