Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts
From: Dai Ngo
Date: Tue Nov 11 2025 - 10:44:18 EST
On 11/11/25 7:34 AM, Chuck Lever wrote:
On 11/11/25 10:24 AM, Dai Ngo wrote:
Jeff is looking at continuing Neil's work in this area.Last thought (for now): I think Neil has some work for dynamic knfsdThis would help, and I prefer this route rather than rework __break_lease
thread
count.. or Jeff? (I am having trouble finding it) Would that work around
this problem?
to return EAGAIN/jukebox while the server recalling the layout.
Adding more threads, IMHO, is not a good long term solution for this
particular issue. There's no guarantee that the server won't get stuck
no matter how many threads are created, and practically speaking, there
are only so many threads that can be created before the server goes
belly up. Or put another way, there's no way to formally prove that the
server will always be able to make forward progress with this solution.
We want NFSD to have a generic mechanism for deferring work so that an
nfsd thread never waits more than a few dozen milliseconds for anything.
This is the tactic NFSD uses for delegation recalls, for example.
I think we need both: (1) dynamic number of server threads and (2) the
ability to defer work as we currently do for the delegation recall. I'd
think we need (1) first as it applies for general server operations and
not just layout recalls.
Even if we had both of these enhancements, we still need to enforce timeout
for __break_lease since we don't want to wait for the recall forever.
-Dai