Re: [PATCH 5.15 00/23] 5.15.160-rc1 review

From: NeilBrown
Date: Tue May 28 2024 - 18:01:30 EST


On Wed, 29 May 2024, Chuck Lever III wrote:
>
>
> > On May 28, 2024, at 10:18 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> >
> >
> > On 28/05/2024 14:14, Chuck Lever III wrote:
> >>> On May 28, 2024, at 5:04 AM, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
> >>>
> >>>
> >>> On 25/05/2024 15:20, Greg Kroah-Hartman wrote:
> >>>> On Sat, May 25, 2024 at 12:13:28AM +0100, Jon Hunter wrote:
> >>>>> Hi Greg,
> >>>>>
> >>>>> On 23/05/2024 14:12, Greg Kroah-Hartman wrote:
> >>>>>> This is the start of the stable review cycle for the 5.15.160 release.
> >>>>>> There are 23 patches in this series, all will be posted as a response
> >>>>>> to this one. If anyone has any issues with these being applied, please
> >>>>>> let me know.
> >>>>>>
> >>>>>> Responses should be made by Sat, 25 May 2024 13:03:15 +0000.
> >>>>>> Anything received after that time might be too late.
> >>>>>>
> >>>>>> The whole patch series can be found in one patch at:
> >>>>>> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.160-rc1.gz
> >>>>>> or in the git tree and branch at:
> >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rcgit linux-5.15.y
> >>>>>> and the diffstat can be found below.
> >>>>>>
> >>>>>> thanks,
> >>>>>>
> >>>>>> greg k-h
> >>>>>>
> >>>>>> -------------
> >>>>>> Pseudo-Shortlog of commits:
> >>>>>
> >>>>> ...
> >>>>>
> >>>>>> NeilBrown <neilb@xxxxxxx>
> >>>>>> nfsd: don't allow nfsd threads to be signalled.
> >>>>>
> >>>>>
> >>>>> I am seeing a suspend regression on a couple boards and bisect is pointing
> >>>>> to the above commit. Reverting this commit does fix the issue.
> >>>> Ugh, that fixes the report from others. Can you cc: everyone on that
> >>>> and figure out what is going on, as this keeps going back and forth...
> >>>
> >>>
> >>> Adding Chuck, Neil and Chris from the bug report here [0].
> >>>
> >>> With the above applied to v5.15.y, I am seeing suspend on 2 of our boards fail. These boards are using NFS and on entry to suspend I am now seeing ..
> >>>
> >>> Freezing of tasks failed after 20.002 seconds (1 tasks refusing to
> >>> freeze, wq_busy=0):
> >>>
> >>> The boards appear to hang at that point. So may be something else missing?
> >> Note that we don't have access to hardware like this, so
> >> we haven't tested that patch (even the upstream version)
> >> with suspend on that hardware.
> >
> >
> > No problem, I would not expect you to have this particular hardware :-)
> >
> >> So, it could be something missing, or it could be that
> >> patch has a problem.
> >> It would help us to know if you observe the same issue
> >> with an upstream kernel, if that is possible.
> >
> >
> > I don't observe this with either mainline, -next or any other stable branch. So that would suggest that something else is missing from linux-5.15.y.
>
> That helps. It would be very helpful to have a reproducer I can
> use to confirm we have a fix. I'm sure this will be a process
> that involves a non-trivial number of iterations.

Missing upstream patch is

Commit 9bd4161c5917 ("SUNRPC: change service idle list to be an llist")

This contains some freezer-related changes which probably should
have been a separate patch.

We probably just need to add "| TASK_FREEZABLE" in one or two places.
I'll post a patch for testing in a little while.

NeilBrown