Re: [PATCH v5 2/2] xfs: prevent close() from hanging on frozen filesystems

From: Aditya Prakash Srivastava

Date: Wed Jun 24 2026 - 14:28:33 EST


Hi Darrick,

Thanks for the feedback and review! I hope you had a great
vacation.

I looked closely at wrapping the block inside xfs_file_release()
with sb_start_write_trylock(), but I believe sticking with the
transaction flag approach is much more robust for a few reasons:

Lock Ordering Safety & Lockdep:
In XFS, the canonical locking hierarchy is:

sb_start_write (freeze lock) -> IOLOCK -> ILOCK ->
xfs_trans_alloc

xfs_file_release() already holds XFS_IOLOCK_EXCL (acquired via
xfs_ilock_nowait()). Attempting to acquire
sb_start_write_trylock() inside that block would mean acquiring
the freeze lock while holding the IOLOCK. Even though a trylock
itself does not block, this is a direct inversion of the
canonical locking hierarchy and is highly likely to trigger
lockdep dependency warnings.

Keeping the check inside xfs_trans_alloc() relies on
well-defined, centralized write-count evaluation, avoiding any
lockdep headaches.

Encapsulation and Readability:
Open-coding the freeze trylock at the VFS release boundary makes
xfs_file_release() quite fragile and harder to reason about
over time. The resulting nested conditional block is prone to
subtle bugs during future refactoring of the file release and
blockgc paths.

Architectural Consistency:
xfs_trans_alloc() is the canonical, centralized place in XFS to
manage write-count blocking semantics. The
XFS_TRANS_WRITECOUNT_TRYLOCK flag provides an explicit, clean
interface to tell the transaction layer exactly how to handle a
freeze, rather than bypassing it at a higher layer.

Given that this keeps the VFS/XFS interface clean and Christoph has
already reviewed/endorsed this version, I'd prefer to move forward
with the current iteration unless there is a strict functional
downside you see to the flag approach.

Thanks again for the review and guidance!

Best regards,
Aditya

On Wed, Jun 24, 2026 at 11:05 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Tue, Jun 16, 2026 at 05:38:50AM +0000, Aditya Srivastava wrote:
> > From: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
> >
> > When a file with active speculative post-EOF preallocations is closed,
> > xfs_file_release() synchronously triggers xfs_free_eofblocks() to clean
> > them up. This requires allocating a write transaction (xfs_trans_alloc),
> > which blocks indefinitely if the filesystem is currently frozen or in the
> > process of freezing, as it waits to acquire the superblock's write lock.
> >
> > As a result, a close() system call on a read-write file descriptor can
> > hang indefinitely in percpu_rwsem_wait() until the filesystem is thawed,
> > even if the file is closed by a non-writer process or after all writing
> > activity has already ceased.
> >
> > To fix this properly and avoid any potential race conditions where a freeze
> > might come in immediately after a writable check, pass the new
> > XFS_TRANS_WRITECOUNT_TRYLOCK flag to xfs_trans_alloc() when freeing
> > speculative preallocations in xfs_file_release().
> >
> > If xfs_free_eofblocks() returns -EAGAIN on a trylock failure, we cleanly
> > bypass setting XFS_EOFBLOCKS_RELEASED on the inode, ensuring subsequent
> > releases or the background blockgc garbage collector can successfully retry
> > the cleanup once the filesystem thaws.
> >
> > Also, add the new trans_flags parameter to xfs_free_eofblocks() to make
> > its usage stand out, and update existing callers to pass 0 to preserve
> > standard blocking paths.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=205833
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1474726
> > Suggested-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
> > ---
> > fs/xfs/xfs_bmap_util.c | 10 ++++++----
> > fs/xfs/xfs_bmap_util.h | 2 +-
> > fs/xfs/xfs_file.c | 8 +++++---
> > fs/xfs/xfs_icache.c | 2 +-
> > fs/xfs/xfs_inode.c | 2 +-
> > 5 files changed, 14 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 0ab00615f1ad..a99aae4a1631 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -574,7 +574,8 @@ xfs_can_free_eofblocks(
> > */
> > int
> > xfs_free_eofblocks(
> > - struct xfs_inode *ip)
> > + struct xfs_inode *ip,
> > + uint trans_flags)
> > {
> > struct xfs_trans *tp;
> > struct xfs_mount *mp = ip->i_mount;
> > @@ -604,9 +605,10 @@ xfs_free_eofblocks(
> > return 0;
> > }
> >
> > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0,
> > + trans_flags, &tp);
> > if (error) {
> > - ASSERT(xfs_is_shutdown(mp));
> > + ASSERT(error == -EAGAIN || xfs_is_shutdown(mp));
> > return error;
> > }
> >
> > @@ -928,7 +930,7 @@ xfs_prepare_shift(
> > * into the accessible region of the file.
> > */
> > if (xfs_can_free_eofblocks(ip)) {
> > - error = xfs_free_eofblocks(ip);
> > + error = xfs_free_eofblocks(ip, 0);
> > if (error)
> > return error;
> > }
> > diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
> > index c477b3361630..c13774aa0892 100644
> > --- a/fs/xfs/xfs_bmap_util.h
> > +++ b/fs/xfs/xfs_bmap_util.h
> > @@ -66,7 +66,7 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
> >
> > /* EOF block manipulation functions */
> > bool xfs_can_free_eofblocks(struct xfs_inode *ip);
> > -int xfs_free_eofblocks(struct xfs_inode *ip);
> > +int xfs_free_eofblocks(struct xfs_inode *ip, uint trans_flags);
> >
> > int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
> > struct xfs_swapext *sx);
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 845a97c9b063..76c9b2fe7c51 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1806,9 +1806,11 @@ xfs_file_release(
> > */
> > if (!xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
> > xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
> > - if (xfs_can_free_eofblocks(ip) &&
> > - !xfs_iflags_test_and_set(ip, XFS_EOFBLOCKS_RELEASED))
> > - xfs_free_eofblocks(ip);
> > + if (!xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
> > + xfs_can_free_eofblocks(ip) &&
> > + !xfs_free_eofblocks(ip, XFS_TRANS_WRITECOUNT_TRYLOCK))
> > + xfs_iflags_set(ip, XFS_EOFBLOCKS_RELEASED);
>
> Could you prevent the close() stalls by surrounding this with
> sb_start_write_trylock instead of passing transaction allocation flags
> all the way down?
>
> OFC that results in a messy if test:
>
> if (xfs_can_free_eofblocks(...) &&
> !xfs_iflags_test(...RELEASED) &&
> !sb_start_write_trylock(...)) {
> if (!xfs_iflags_test_and_set(...))
> xfs_free_eofblocks(ip);
> sb_end_write(...);
> }
>
> <shrug> Sorry if this is noise, I've been on vacation.
>
> --D
>
> > +
> > xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > }
> >
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index 2040a9292ee6..c575b4acb24c 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -1259,7 +1259,7 @@ xfs_inode_free_eofblocks(
> > *lockflags |= XFS_IOLOCK_EXCL;
> >
> > if (xfs_can_free_eofblocks(ip))
> > - return xfs_free_eofblocks(ip);
> > + return xfs_free_eofblocks(ip, 0);
> >
> > /* inode could be preallocated */
> > trace_xfs_inode_free_eofblocks_invalid(ip);
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index ddf2707c8894..14d3cd04a79f 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1423,7 +1423,7 @@ xfs_inactive(
> > * reference to the inode at this point anyways.
> > */
> > if (xfs_can_free_eofblocks(ip))
> > - error = xfs_free_eofblocks(ip);
> > + error = xfs_free_eofblocks(ip, 0);
> >
> > goto out;
> > }
> > --
> > 2.47.3
> >
> >