Re: [PATCH 5.15 00/23] 5.15.160-rc1 review

From: NeilBrown
Date: Wed May 29 2024 - 17:00:34 EST


On Wed, 29 May 2024, Jon Hunter wrote:
> On 29/05/2024 00:42, NeilBrown wrote:
> > On Wed, 29 May 2024, NeilBrown wrote:
> >>
> >> We probably just need to add "| TASK_FREEZABLE" in one or two places.
> >> I'll post a patch for testing in a little while.
> >
> > There is no TASK_FREEZABLE before v6.1.
> > This isn't due to a missed backport. It is simply because of differences
> > in the freezer in older kernels.
> >
> > Please test this patch.
> >
> > Thanks,
> > NeilBrown
> >
> > From 416bd6ae9a598e64931d34b76aa58f39b11841cd Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@xxxxxxx>
> > Date: Wed, 29 May 2024 09:38:22 +1000
> > Subject: [PATCH] sunrpc: exclude from freezer when waiting for requests:
> >
> > Prior to v6.1, the freezer will only wake a kernel thread from an
> > uninterruptible sleep. Since we changed svc_get_next_xprt() to use and
> > IDLE sleep the freezer cannot wake it. we need to tell the freezer to
> > ignore it instead.
> >
> > Fixes: 9b8a8e5e8129 ("nfsd: don't allow nfsd threads to be signalled.")
> > Signed-off-by: NeilBrown <neilb@xxxxxxx>
> > ---
> > net/sunrpc/svc_xprt.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index b19592673eef..12e9293bd12b 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -764,10 +764,12 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
> > clear_bit(RQ_BUSY, &rqstp->rq_flags);
> > smp_mb__after_atomic();
> >
> > + freezer_do_not_count();
> > if (likely(rqst_should_sleep(rqstp)))
> > time_left = schedule_timeout(timeout);
> > else
> > __set_current_state(TASK_RUNNING);
> > + freezer_count();
> >
> > try_to_freeze();
> >
>
>
> Thanks. I gave this a try on top of v5.15.160-rc1, but I am still seeing
> the following and the board hangs ...
>
> Freezing of tasks failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
>
> So unfortunately this does not fix it :-(

Thanks for testing.
I can only guess that you had an active NFSv4.1 mount and that the
callback thread was causing problems. Please try this. I also changed
to use freezable_schedule* which seems like a better interface to do the
same thing.

If this doesn't fix it, we'll probably need to ask someone who remembers
the old freezer code.

Thanks,
NeilBrown