Aw: Re: [External] : nfsd: memory leak when client does many file operations
From: Jan Schunk
Date: Mon Apr 01 2024 - 13:35:33 EST
Hi,
the bug report is now here:
https://bugzilla.kernel.org/show_bug.cgi?id=218671
PS: I can also confirm, if you use the latest v6.6.22 and only revert e18e157bb5c8 nfsd works without any issue.
> Gesendet: Montag, den 01.04.2024 um 16:08 Uhr
> Von: "Chuck Lever III" <chuck.lever@xxxxxxxxxx>
> An: "Jan Schunk" <scpcom@xxxxxx>
> Cc: "Benjamin Coddington" <bcodding@xxxxxxxxxx>, "Jeff Layton" <jlayton@xxxxxxxxxx>, "Neil Brown" <neilb@xxxxxxx>, "Olga Kornievskaia" <kolga@xxxxxxxxxx>, "Dai Ngo" <dai.ngo@xxxxxxxxxx>, "Tom Talpey" <tom@xxxxxxxxxx>, "Linux NFS Mailing List" <linux-nfs@xxxxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "David Howells" <dhowells@xxxxxxxxxx>, "Linux regressions mailing list" <regressions@xxxxxxxxxxxxxxx>
> Betreff: Re: [External] : nfsd: memory leak when client does many file operations
>
>
>
> > On Mar 30, 2024, at 12:26 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >
> > On Sat, Mar 30, 2024 at 04:26:09PM +0100, Jan Schunk wrote:
> >> Full test result:
> >>
> >> $ git bisect start v6.6 v6.5
> >> Bisecting: 7882 revisions left to test after this (roughly 13 steps)
> >> [a1c19328a160c80251868dbd80066dce23d07995] Merge tag 'soc-arm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> >> --
> >> $ git bisect good
> >> Bisecting: 3935 revisions left to test after this (roughly 12 steps)
> >> [e4f1b8202fb59c56a3de7642d50326923670513f] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> >> --
> >> $ git bisect bad
> >> Bisecting: 2014 revisions left to test after this (roughly 11 steps)
> >> [e0152e7481c6c63764d6ea8ee41af5cf9dfac5e9] Merge tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
> >> --
> >> $ git bisect bad
> >> Bisecting: 975 revisions left to test after this (roughly 10 steps)
> >> [4a3b1007eeb26b2bb7ae4d734cc8577463325165] Merge tag 'pinctrl-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> >> --
> >> $ git bisect good
> >> Bisecting: 476 revisions left to test after this (roughly 9 steps)
> >> [4debf77169ee459c46ec70e13dc503bc25efd7d2] Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
> >> --
> >> $ git bisect good
> >> Bisecting: 237 revisions left to test after this (roughly 8 steps)
> >> [e7e9423db459423d3dcb367217553ad9ededadc9] Merge tag 'v6.6-vfs.super.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
> >> --
> >> $ git bisect good
> >> Bisecting: 141 revisions left to test after this (roughly 7 steps)
> >> [8ae5d298ef2005da5454fc1680f983e85d3e1622] Merge tag '6.6-rc-ksmbd-fixes-part1' of git://git.samba.org/ksmbd
> >> --
> >> $ git bisect good
> >> Bisecting: 61 revisions left to test after this (roughly 6 steps)
> >> [99d99825fc075fd24b60cc9cf0fb1e20b9c16b0f] Merge tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
> >> --
> >> $ git bisect bad
> >> Bisecting: 39 revisions left to test after this (roughly 5 steps)
> >> [7b719e2bf342a59e88b2b6215b98ca4cf824bc58] SUNRPC: change svc_recv() to return void.
> >> --
> >> $ git bisect bad
> >> Bisecting: 19 revisions left to test after this (roughly 4 steps)
> >> [e7421ce71437ec8e4d69cc6bdf35b6853adc5050] NFSD: Rename struct svc_cacherep
> >> --
> >> $ git bisect good
> >> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> >> [baabf59c24145612e4a975f459a5024389f13f5d] SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array
> >> --
> >> $ git bisect bad
> >> Bisecting: 4 revisions left to test after this (roughly 2 steps)
> >> [be2be5f7f4436442d8f6bffbb97a6f438df2896b] lockd: nlm_blocked list race fixes
> >> --
> >> $ git bisect good
> >> Bisecting: 2 revisions left to test after this (roughly 1 step)
> >> [d424797032c6e24b44037e6c7a2d32fd958300f0] nfsd: inherit required unset default acls from effective set
> >> --
> >> $ git bisect good
> >> Bisecting: 0 revisions left to test after this (roughly 1 step)
> >> [e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4] SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call
> >> --
> >> $ git bisect bad
> >> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> >> [2eb2b93581813b74c7174961126f6ec38eadb5a7] SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly
> >> --
> >> $ git bisect good
> >> e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4 is the first bad commit
> >> commit e18e157bb5c8c1cd8a9ba25acfdcf4f3035836f4
> >
> > This is a plausible bisect result for this behavior, so nice work.
> >
> > David (cc'd), can you have a brief look at this? What did we miss?
> > I'm guessing it's a page reference count issue that might occur
> > only when the XDR head and tail buffers are in the same page. Or
> > it might occur if two entries in the XDR page array point to the
> > same page...?
> >
> > /me stabs in the darkness
> >
> >
> >> I found the memory loss inside /proc/meminfo only on MemAvailable
> >> MemTotal: 346948 kB
> >> On a bad test run in looks like this:
> >> -MemAvailable: 210820 kB
> >> +MemAvailable: 26608 kB
> >> On a good test run it looks like this:
> >> -MemAvailable: 215872 kB
> >> +MemAvailable: 221128 kB
>
> Jan, may I ask one more favor? Since this might take a little
> time to run down, can you open a bug report on
> bugzilla.kernel.org <http://bugzilla.kernel.org/>, and copy in the symptomology and the
> bisect results? It will get assigned to Trond, and he can
> pass it to me.
>
> The problem looks like how we're using a page_frag_cache to
> handle the record marker buffers, but I'm not sure what the
> proper solution is yet.
>
> #regzbot ^introduced: e18e157bb5c8
>
> --
> Chuck Lever
>
>