Re: [PATCH] sunrpc: add a rpc_clnt shutdown control in debugfs

From: Trond Myklebust
Date: Wed Mar 12 2025 - 18:32:03 EST


On Wed, 2025-03-12 at 10:37 -0400, Jeff Layton wrote:
> On Wed, 2025-03-12 at 09:52 -0400, Benjamin Coddington wrote:
> > On 12 Mar 2025, at 9:36, Jeff Layton wrote:
> >
> > > There have been confirmed reports where a container with an NFS
> > > mount
> > > inside it dies abruptly, along with all of its processes, but the
> > > NFS
> > > client sticks around and keeps trying to send RPCs after the
> > > networking
> > > is gone.
> > >
> > > We have a reproducer where if we SIGKILL a container with an NFS
> > > mount,
> > > the RPC clients will stick around indefinitely. The orchestrator
> > > does a MNT_DETACH unmount on the NFS mount, and then tears down
> > > the
> > > networking while there are still RPCs in flight.
> > >
> > > Recently new controls were added[1] that allow shutting down an
> > > NFS
> > > mount. That doesn't help here since the mount namespace is
> > > detached from
> > > any tasks at this point.
> >
> > That's interesting - seems like the orchestrator could just reorder
> > its
> > request to shutdown before detaching the mount namespace.  Not an
> > objection,
> > just wondering why the MNT_DETACH must come first.
> >
>
> The reproducer we have is to systemd-nspawn a container, mount up an
> NFS mount inside it, start some I/O on it with fio and then kill -9
> the
> systemd running inside the container. There isn't much the
> orchestrator
> (root-level systemd) can do to at that point other than clean up
> what's
> left.
>
> I'm still working on a way to reliably detect when this has happened.
> For now, we just have to notice that some clients aren't dying.
>
> > > Transplant shutdown_client() to the sunrpc module, and give it a
> > > more
> > > distinct name. Add a new debugfs sunrpc/rpc_clnt/*/shutdown knob
> > > that
> > > allows the same functionality as the one in /sys/fs/nfs, but at
> > > the
> > > rpc_clnt level.
> > >
> > > [1]: commit d9615d166c7e ("NFS: add sysfs shutdown knob").
> > >
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> >
> > I have a TODO to patch Documentation/ for this knob mostly to write
> > warnings
> > because there are some potential "gotchas" here - for example you
> > can have
> > shared RPC clients and shutting down one of those can cause
> > problems for a
> > different mount (this is true today with the
> > /sys/fs/nfs/[bdi]/shutdown
> > knob).  Shutting down aribitrary clients will definitely break
> > things in
> > weird ways, its not a safe place to explore.
> >
>
> Yes, you really do need to know what you're doing. 0200 permissions
> are
> essential for this file, IOW. Thanks for the R-b!

Sorry, but NACK! We should not be adding control mechanisms to debugfs.

One thing that might work in situations like this is perhaps to make
use of the fact that we are monitoring whether or not rpc_pipefs is
mounted. So if the mount is containerised, and the orchestrator
unmounts everything, including rpc_pipefs, we might take that as a hint
that we should treat any future connection errors as being fatal.

Otherwise, we'd have to be able to monitor the root task, and check if
it is still alive in order to figure out if out containerised world has
collapsed.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx