Re: [PATCH 5.15 00/23] 5.15.160-rc1 review
From: Chuck Lever III
Date: Tue May 28 2024 - 20:14:31 EST
> On May 28, 2024, at 7:44 PM, NeilBrown <neilb@xxxxxxx> wrote:
>
> On Wed, 29 May 2024, Chuck Lever III wrote:
>>
>>
>>> On May 28, 2024, at 6:01 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>>
>>> On Wed, 29 May 2024, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On May 28, 2024, at 10:18 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> On 28/05/2024 14:14, Chuck Lever III wrote:
>>>>>>> On May 28, 2024, at 5:04 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 25/05/2024 15:20, Greg Kroah-Hartman wrote:
>>>>>>>> On Sat, May 25, 2024 at 12:13:28AM +0100, Jon Hunter wrote:
>>>>>>>>> Hi Greg,
>>>>>>>>>
>>>>>>>>> On 23/05/2024 14:12, Greg Kroah-Hartman wrote:
>>>>>>>>>> This is the start of the stable review cycle for the 5.15.160 release.
>>>>>>>>>> There are 23 patches in this series, all will be posted as a response
>>>>>>>>>> to this one. If anyone has any issues with these being applied, please
>>>>>>>>>> let me know.
>>>>>>>>>>
>>>>>>>>>> Responses should be made by Sat, 25 May 2024 13:03:15 +0000.
>>>>>>>>>> Anything received after that time might be too late.
>>>>>>>>>>
>>>>>>>>>> The whole patch series can be found in one patch at:
>>>>>>>>>> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.160-rc1.gz
>>>>>>>>>> or in the git tree and branch at:
>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
>>>>>>>>>> and the diffstat can be found below.
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> greg k-h
>>>>>>>>>>
>>>>>>>>>> -------------
>>>>>>>>>> Pseudo-Shortlog of commits:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>> NeilBrown <neilb@xxxxxxx>
>>>>>>>>>> nfsd: don't allow nfsd threads to be signalled.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am seeing a suspend regression on a couple boards and bisect is pointing
>>>>>>>>> to the above commit. Reverting this commit does fix the issue.
>>>>>>>> Ugh, that fixes the report from others. Can you cc: everyone on that
>>>>>>>> and figure out what is going on, as this keeps going back and forth...
>>>>>>>
>>>>>>>
>>>>>>> Adding Chuck, Neil and Chris from the bug report here [0].
>>>>>>>
>>>>>>> With the above applied to v5.15.y, I am seeing suspend on 2 of our boards fail. These boards are using NFS and on entry to suspend I am now seeing ...
>>>>>>>
>>>>>>> Freezing of tasks failed after 20.002 seconds (1 tasks refusing to
>>>>>>> freeze, wq_busy=0):
>>>>>>>
>>>>>>> The boards appear to hang at that point. So may be something else missing?
>>>>>> Note that we don't have access to hardware like this, so
>>>>>> we haven't tested that patch (even the upstream version)
>>>>>> with suspend on that hardware.
>>>>>
>>>>>
>>>>> No problem, I would not expect you to have this particular hardware :-)
>>>>>
>>>>>> So, it could be something missing, or it could be that
>>>>>> patch has a problem.
>>>>>> It would help us to know if you observe the same issue
>>>>>> with an upstream kernel, if that is possible.
>>>>>
>>>>>
>>>>> I don't observe this with either mainline, -next or any other stable branch. So that would suggest that something else is missing from linux-5.15.y.
>>>>
>>>> That helps. It would be very helpful to have a reproducer I can
>>>> use to confirm we have a fix. I'm sure this will be a process
>>>> that involves a non-trivial number of iterations.
>>>
>>> Missing upstream patch is
>>>
>>> Commit 9bd4161c5917 ("SUNRPC: change service idle list to be an llist")
>>>
>>> This contains some freezer-related changes which probably should
>>> have been a separate patch.
>>
>> Thanks for tracking that down.
>>
>>
>>> We probably just need to add "| TASK_FREEZABLE" in one or two places.
>>> I'll post a patch for testing in a little while.
>>
>> My understanding is that the stable maintainers prefer a backport
>> of a patch (or patches) that are already applied to Linus' tree.
>
> They also preferred a full backport of fs/nfsd/.. That hasn't gone so
> well :-)
Really? I count about 350 patches in the initial backport. Those patches
include nearly every NFSD patch from v5.16 up to the end of v6.2. We
agreed to stop once the filecache fixes had been applied; no-one asked
for a "full backport" from torvalds/HEAD.
Only two more patches have been applied since then. Three if you count
this one. All of these issues have been very narrow corner cases or
obscure test failures.
That is quite good, if you ask me. I don't see a problem, given the
monumental task and lack of NFSD stable testing infrastructure before
I began.
> In this case we would need
>
> Commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
>
> to get TASK_FREEZABLE.
> I doubt that would be a good choice.
I will let Greg and Sasha decide how they want to proceed, but it
would be wise to include this detail in your patch description.
--
Chuck Lever