Re: [PATCH] nfsd: fix incorrect high limit in clamp() on over-allocation

From: Geert Uytterhoeven
Date: Tue Dec 24 2024 - 04:17:00 EST


Hi Chuck,

On Mon, Dec 23, 2024 at 6:49 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> On 12/23/24 11:06 AM, Geert Uytterhoeven wrote:
> > On Mon, Dec 9, 2024 at 3:48 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >> On 12/9/24 7:25 AM, Vincent Mailhol via B4 Relay wrote:
> >>> From: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
> >>>
> >>> If over allocation occurs in nfsd4_get_drc_mem(), total_avail is set
> >>> to zero. Consequently,
> >>>
> >>> clamp_t(unsigned long, avail, slotsize, total_avail/scale_factor);
> >>>
> >>> gives:
> >>>
> >>> clamp_t(unsigned long, avail, slotsize, 0);
> >>>
> >>> resulting in a clamp() call where the high limit is smaller than the
> >>> low limit, which is undefined: the result could be either slotsize or
> >>> zero depending on the order of evaluation.
> >>>
> >>> Luckily, the two instructions just below the clamp() recover the
> >>> undefined behaviour:
> >>>
> >>> num = min_t(int, num, avail / slotsize);
> >>> num = max_t(int, num, 1);
> >>>
> >>> If avail = slotsize, the min_t() sets it back to 1. If avail = 0, the
> >>> max_t() sets it back to 1.
> >>>
> >>> So this undefined behaviour has no visible effect.
> >>>
> >>> Anyway, remove the undefined behaviour in clamp() by only calling it
> >>> and only doing the calculation of num if memory is still available.
> >>> Otherwise, if over-allocation occurred, directly set num to 1 as
> >>> intended by the author.
> >>>
> >>> While at it, apply below checkpatch fix:
> >>>
> >>> WARNING: min() should probably be min_t(unsigned long, NFSD_MAX_MEM_PER_SESSION, total_avail)
> >>> #100: FILE: fs/nfsd/nfs4state.c:1954:
> >>> + avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> >>>
> >>> Fixes: 7f49fd5d7acd ("nfsd: handle drc over-allocation gracefully.")
> >>> Signed-off-by: Vincent Mailhol <mailhol.vincent@xxxxxxxxxx>
> >>> ---
> >>> Found by applying below patch from David:
> >>>
> >>> https://lore.kernel.org/all/34d53778977747f19cce2abb287bb3e6@xxxxxxxxxxxxxxxx/
> >>>
> >>> Doing so yield this report:
> >>>
> >>> In function ‘nfsd4_get_drc_mem’,
> >>> inlined from ‘check_forechannel_attrs’ at fs/nfsd/nfs4state.c:3791:16,
> >>> inlined from ‘nfsd4_create_session’ at fs/nfsd/nfs4state.c:3864:11:
> >>> ././include/linux/compiler_types.h:542:38: error: call to ‘__compiletime_assert_3707’ declared with attribute error: clamp() low limit (unsigned long)(slotsize) greater than high limit (unsigned long)(total_avail/scale_factor)
> >>> 542 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> >>> | ^
> >>> ././include/linux/compiler_types.h:523:4: note: in definition of macro ‘__compiletime_assert’
> >>> 523 | prefix ## suffix(); \
> >>> | ^~~~~~
> >>> ././include/linux/compiler_types.h:542:2: note: in expansion of macro ‘_compiletime_assert’
> >>> 542 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> >>> | ^~~~~~~~~~~~~~~~~~~
> >>> ./include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
> >>> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> >>> | ^~~~~~~~~~~~~~~~~~
> >>> ./include/linux/minmax.h:114:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
> >>> 114 | BUILD_BUG_ON_MSG(statically_true(ulo > uhi), \
> >>> | ^~~~~~~~~~~~~~~~
> >>> ./include/linux/minmax.h:121:2: note: in expansion of macro ‘__clamp_once’
> >>> 121 | __clamp_once(val, lo, hi, __UNIQUE_ID(v_), __UNIQUE_ID(l_), __UNIQUE_ID(h_))
> >>> | ^~~~~~~~~~~~
> >>> ./include/linux/minmax.h:275:36: note: in expansion of macro ‘__careful_clamp’
> >>> 275 | #define clamp_t(type, val, lo, hi) __careful_clamp((type)(val), (type)(lo), (type)(hi))
> >>> | ^~~~~~~~~~~~~~~
> >>> fs/nfsd/nfs4state.c:1972:10: note: in expansion of macro ‘clamp_t’
> >>> 1972 | avail = clamp_t(unsigned long, avail, slotsize,
> >>> | ^~~~~~~
> >>>
> >>> Because David's patch is targetting Andrew's mm tree, I would suggest
> >>> that my patch also goes to that tree.
> >>> ---
> >>> fs/nfsd/nfs4state.c | 46 +++++++++++++++++++++++++---------------------
> >>> 1 file changed, 25 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >>> index 741b9449f727defc794347f1b116c955d715e691..eb91460c434e30f6df70f66d937f8c0f334b8e1b 100644
> >>> --- a/fs/nfsd/nfs4state.c
> >>> +++ b/fs/nfsd/nfs4state.c
> >>> @@ -1944,35 +1944,39 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
> >>> {
> >>> u32 slotsize = slot_bytes(ca);
> >>> u32 num = ca->maxreqs;
> >>> - unsigned long avail, total_avail;
> >>> - unsigned int scale_factor;
> >>>
> >>> spin_lock(&nfsd_drc_lock);
> >>> - if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> >>> + if (nfsd_drc_max_mem > nfsd_drc_mem_used) {
> >>> + unsigned long avail, total_avail;
> >>> + unsigned int scale_factor;
> >>> +
> >>> total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> >>> - else
> >>> + avail = min_t(unsigned long,
> >>> + NFSD_MAX_MEM_PER_SESSION, total_avail);
> >>> + /*
> >>> + * Never use more than a fraction of the remaining memory,
> >>> + * unless it's the only way to give this client a slot.
> >>> + * The chosen fraction is either 1/8 or 1/number of threads,
> >>> + * whichever is smaller. This ensures there are adequate
> >>> + * slots to support multiple clients per thread.
> >>> + * Give the client one slot even if that would require
> >>> + * over-allocation--it is better than failure.
> >>> + */
> >>> + scale_factor = max_t(unsigned int,
> >>> + 8, nn->nfsd_serv->sv_nrthreads);
> >>> +
> >>> + avail = clamp_t(unsigned long, avail, slotsize,
> >>> + total_avail/scale_factor);
> >>> + num = min_t(int, num, avail / slotsize);
> >>> + num = max_t(int, num, 1);
> >>> + } else {
> >>> /* We have handed out more space than we chose in
> >>> * set_max_drc() to allow. That isn't really a
> >>> * problem as long as that doesn't make us think we
> >>> * have lots more due to integer overflow.
> >>> */
> >>> - total_avail = 0;
> >>> - avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> >>> - /*
> >>> - * Never use more than a fraction of the remaining memory,
> >>> - * unless it's the only way to give this client a slot.
> >>> - * The chosen fraction is either 1/8 or 1/number of threads,
> >>> - * whichever is smaller. This ensures there are adequate
> >>> - * slots to support multiple clients per thread.
> >>> - * Give the client one slot even if that would require
> >>> - * over-allocation--it is better than failure.
> >>> - */
> >>> - scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> >>> -
> >>> - avail = clamp_t(unsigned long, avail, slotsize,
> >>> - total_avail/scale_factor);
> >>> - num = min_t(int, num, avail / slotsize);
> >>> - num = max_t(int, num, 1);
> >>> + num = 1;
> >>> + }
> >>> nfsd_drc_mem_used += num * slotsize;
> >>> spin_unlock(&nfsd_drc_lock);
> >>>
> >>>
> >>> ---
> >>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
> >>> change-id: 20241209-nfs4state_fix-bc6f1c1fc1d1
> >
> >> We're replacing this code wholesale in 6.14. See:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-testing&id=8233f78fbd970cbfcb9f78c719ac5a3aac4ea053
> >
> > Bad commit reference?
>
> Expired commit reference. That commit lives in a testing branch, which
> has subsequently been rebased. The current reference is:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-testing&id=94af736b97fbd8d02d66b3a0271f9c618f446bf0
>
> > And hence this is still failing in next-20241220...
>
> I haven't moved these commits to the nfsd-next branch yet.
>
> Is there an urgency for this fix? Is this a problem in LTS kernels

Currently there are build failures in linux-next due to this, possibly
hiding other issues.

> that might need a backport? 94af736 is not intended to be backported.

We'll see if the issue ever shows up in stable.
I understand it is exposed by stricter checking in the min/max macros.

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds