Re: [Patch 0/2] NFSD: Fix server hang when there are multiple layout conflicts

From: Christoph Hellwig

Date: Fri Nov 07 2025 - 08:30:50 EST


On Thu, Nov 06, 2025 at 09:05:24AM -0800, Dai Ngo wrote:
> When a layout conflict triggers a call to __break_lease, the function
> nfsd4_layout_lm_break clears the fl_break_time timeout before sending
> the CB_LAYOUTRECALL. As a result, __break_lease repeatedly restarts
> its loop, waiting indefinitely for the conflicting file lease to be
> released.
>
> If the number of lease conflicts matches the number of NFSD threads (which
> defaults to 8), all available NFSD threads become occupied. Consequently,
> there are no threads left to handle incoming requests or callback replies,
> leading to a total hang of the NFS server.
>
> This issue is reliably reproducible by running the Git test suite on a
> configuration using SCSI layout.

I guess we need to implement asynchronous breaking of leases. Which
conceptually shouldn't be too hard.