Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit

From: NeilBrown
Date: Fri Oct 11 2024 - 17:09:07 EST


On Sat, 12 Oct 2024, Chuck Lever III wrote:
>
>
> > On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> >> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> >>> On Sun, 01 Sep 2024, syzbot wrote:
> >>>> syzbot has found a reproducer for the following issue on:
> >>>
> >>> I had a poke around using the provided disk image and kernel for
> >>> exploring.
> >>>
> >>> I think the problem is demonstrated by this stack :
> >>>
> >>> [<0>] rpc_wait_bit_killable+0x1b/0x160
> >>> [<0>] __rpc_execute+0x723/0x1460
> >>> [<0>] rpc_execute+0x1ec/0x3f0
> >>> [<0>] rpc_run_task+0x562/0x6c0
> >>> [<0>] rpc_call_sync+0x197/0x2e0
> >>> [<0>] rpcb_register+0x36b/0x670
> >>> [<0>] svc_unregister+0x208/0x730
> >>> [<0>] svc_bind+0x1bb/0x1e0
> >>> [<0>] nfsd_create_serv+0x3f0/0x760
> >>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> >>> [<0>] genl_rcv_msg+0xb16/0xec0
> >>> [<0>] netlink_rcv_skb+0x1e5/0x430
> >>>
> >>> No rpcbind is running on this host so that "svc_unregister" takes a
> >>> long time. Maybe not forever but if a few of these get queued up all
> >>> blocking some other thread, then maybe that pushed it over the limit.
> >>>
> >>> The fact that rpcbind is not running might not be relevant as the test
> >>> messes up the network. "ping 127.0.0.1" stops working.
> >>>
> >>> So this bug comes down to "we try to contact rpcbind while holding a
> >>> mutex and if that gets no response and no error, then we can hold the
> >>> mutex for a long time".
> >>>
> >>> Are we surprised? Do we want to fix this? Any suggestions how?
> >>
> >> In the past, we've tried to address "hanging upcall" issues where
> >> the kernel part of an administrative command needs a user space
> >> service that isn't working or present. (eg mount needing a running
> >> gssd)
> >>
> >> If NFSD is using the kernel RPC client for the upcall, then maybe
> >> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> >> immediate failure.
> >>
> >> IMO this should be addressed.
> >>
> >>
> >
> > I sent a patch that does the above, but now I'm wondering if we ought
> > to take another approach. The listener array can be pretty long. What
> > if we instead were to just drop and reacquire the mutex in the loop at
> > strategic points? Then we wouldn't squat on the mutex for so long.
> >
> > Something like this maybe? It's ugly but it might prevent hung task
> > warnings, and listener setup isn't a fastpath anyway.
> >
> >
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 3adbc05ebaac..5de01fb4c557 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> >
> > set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > spin_unlock_bh(&serv->sv_lock);
> >
> > svc_xprt_close(xprt);
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > spin_lock_bh(&serv->sv_lock);
> > }
> >
> > @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > /* always save the latest error */
> > if (ret < 0)
> > err = ret;
> > +
> > + /* ensure we don't squat on the mutex for too long */
> > + mutex_unlock(&nfsd_mutex);
> > + mutex_lock(&nfsd_mutex);
> > }
> >
> > if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
>
> I had a look at the rpcb upcall code a couple of weeks ago.
> I'm not convinced that setting SOFTCONN in all cases will
> help but unfortunately the reasons for my skepticism have
> all but leaked out of my head.
>
> Releasing and re-acquiring the mutex is often a sign of
> a deeper problem. I think you're in the right vicinity
> but I'd like to better understand the actual cause of
> the delay. The listener list shouldn't be all that long,
> but maybe it has a unintentional loop in it?

I think it is wrong to register with rpcbind while holding a mutex.
Registering with rpcbind doesn't need to by synchronous does it? Could
we punt that to a workqueue?
Do we need to get a failure status back somehow??
wait_for_completion_killable() somewhere??

>
> I wish we had a reproducer for these issues.

We do I think. I downloaded the provided kernel and root image and ran
the reproduced supplied (or maybe an earlier version) and it triggered
quite easily.

NeilBrown

>
>
> --
> Chuck Lever
>
>
>