Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
From: Christoph Hellwig
Date: Tue Nov 11 2025 - 10:36:45 EST
On Tue, Nov 11, 2025 at 10:34:04AM -0500, Chuck Lever wrote:
> > This would help, and I prefer this route rather than rework __break_lease
> > to return EAGAIN/jukebox while the server recalling the layout.
>
> Jeff is looking at continuing Neil's work in this area.
>
> Adding more threads, IMHO, is not a good long term solution for this
> particular issue. There's no guarantee that the server won't get stuck
> no matter how many threads are created, and practically speaking, there
> are only so many threads that can be created before the server goes
> belly up. Or put another way, there's no way to formally prove that the
> server will always be able to make forward progress with this solution.
Agreed.
> We want NFSD to have a generic mechanism for deferring work so that an
> nfsd thread never waits more than a few dozen milliseconds for anything.
> This is the tactic NFSD uses for delegation recalls, for example.
Agreed. This would also be for I/O itself, as with O_DIRECT we can
fully support direct I/O, and even with buffered I/O there is some
limited non-blocking read and write support.