Regression caused by commit c54f24e3 (nfsd: fix performance-limiting session calculation)

From: Paul Menzel
Date: Tue Jul 02 2019 - 21:36:04 EST

Dear Bruce,

Could it be that commit c54f24e3 (nfsd: fix performance-limiting session calculation) causes a regression on big memory machines (1 TB)?

From c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 Mon Sep 17 00:00:00 2001
From: "J. Bruce Fields" <bfields@xxxxxxxxxx>
Date: Thu, 21 Feb 2019 10:47:00 -0500
Subject: [PATCH] nfsd: fix performance-limiting session calculation

We're unintentionally limiting the number of slots per nfsv4.1 session
to 10. Often more than 10 simultaneous RPCs are needed for the best

This calculation was meant to prevent any one client from using up more
than a third of the limit we set for total memory use across all clients
and sessions. Instead, it's limiting the client to a third of the
maximum for a single session.

Fix this.

Reported-by: Chris Tracy <ctracy@xxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches"
Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
fs/nfsd/nfs4state.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fb3c9844c82a..6a45fb00c5fc 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1544,16 +1544,16 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
u32 slotsize = slot_bytes(ca);
u32 num = ca->maxreqs;
- int avail;
+ unsigned long avail, total_avail;
- avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION,
- nfsd_drc_max_mem - nfsd_drc_mem_used);
+ total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
+ avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
* Never use more than a third of the remaining memory,
* unless it's the only way to give this client a slot:
- avail = clamp_t(int, avail, slotsize, avail/3);
+ avail = clamp_t(int, avail, slotsize, total_avail/3);
num = min_t(int, num, avail / slotsize);
nfsd_drc_mem_used += num * slotsize;

Booting a 80 threads, 1 TB server with Linux 4.19.56 and Linux 5.2-rc7 causes connections problems for the clients. The problems do not happen on servers with just 96 GB memory for example. Bisecting points to the two commits below (and I can only continue tomorrow).

c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 (nfsd: fix performance-limiting session calculation)
8127d82705998568b52ac724e28e00941538083d (NFS: Don't recoalesce on error in nfs_pageio_complete_mirror())

If you have things I could do to verify this besides reverting it
tomorrow, please tell. Itâd be great if it could be fixed before Linux
5.2 is released.

Kind regards,