Re: [PATCH] nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_net
From: Chuck Lever III
Date: Sun Jun 18 2023 - 12:00:00 EST
> On Jun 18, 2023, at 8:09 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Sun, 2023-06-18 at 12:40 +0200, Thorsten Leemhuis wrote:
>> On 16.06.23 22:54, Jeff Layton wrote:
>>> On Fri, 2023-06-16 at 16:27 -0400, Chuck Lever wrote:
>>>> Thanks Eirik and Jeff.
>>>>
>>>> At this point in the release cycle, I plan to apply this for the
>>>> next merge window (6.5).
>>>
>>> I think we should take this in sooner. This is a regression and a
>>> user-triggerable oops in the right situation. If:
>>>
>>> - non-x86_64 arch
>>> - /proc/fs/nfsd is mounted in the namespace
>>> - nfsd is not started in the namespace
>>> - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats"
>>
>> FWIW, might be worth to simply tell Linus about it and let him decide,
>> that's totally fine and even documented in the old and the new docs for
>> handling regressions[1].
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/Documentation/process/handling-regressions.rst?id=eed892da9cd08be76a8f467c600ef58716dbb4d2
>>
>
> I'd rather Chuck make the final call here.
Thanks! I feel this one needs broader testing than we can manage
in just a couple of days. If this were earlier in the -rc cycle
I would pull the patch right into 6.4-rc without hesitation. It
is obviously -rc material, but the timing is unfortunate.
I'm planning the nfsd for-6.5 pull request early in the merge
window, so practically speaking it shouldn't delay the finalized
upstream version of this patch by more than a few days.
> The original patch
> description didn't point out how easy it is to trigger a panic with
> this,
I will add that information.
> so I was hoping to convince him.
Oh, I agree it's significant. I just don't want to compound the
problem by sending a possibly-buggy patch at the last moment
in the 6.4 cycle.
When we have our shiny new CI infrastructure in place, we will
be able to move faster and with more confidence on fixes this
late in a cycle.
> To further that argument too:
>
> I have to wonder if this bug might cause (temporary?) memory corruption
> on x86_64. The code hits a spinlock in that struct, so there may be a
> window of time where it doesn't contain what's expected.
>
>>>>> Cc: stable@xxxxxxxxxxxxxxx # v6.3+
>>>>> Fixes: f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup")
>>>>
>>>> Why both Fixes: and Cc: stable?
>>>
>>> *shrug* : they mean different things. I can drop the Cc stable.
>>
>> Please leave it, only a stable tag ensures backporting; a fixes tag
>> alone is not enough. See [1] above or these recent messages from Greg:
>>
>> https://lore.kernel.org/all/2023061137-algorithm-almanac-1337@gregkh/
>> https://lore.kernel.org/all/2023060703-colony-shakily-3514@gregkh/
>
> Chuck and I also recently requested that the stable series not pick
> patches automatically for fs/nfsd.
To be clear, we requested that stable not pick up patches for
fs/nfsd using AUTOSEL. Basically that means stable will pick
up only fs/nfsd patches that are explicitly marked with Fixes:
or Cc:stable.
> This does need to be backported
> though, so I cc'ed stable to make that clear.
OK, I'll add the "cc: stable" back too.
My question wasn't so much a demand to drop the tag, but rather
a request for an explanation of why both were needed. I'll try
to be less terse next time.
Thorsten, if you've added this issue to the regbot database,
please feel free to follow up with the correct tags to mark the
issue closed as appropriate.
--
Chuck Lever