[PATCH v2] xfs: prevent close() from hanging on frozen filesystems
From: Aditya Srivastava
Date: Thu Jun 11 2026 - 14:06:18 EST
From: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
When a file with active speculative post-EOF preallocations is closed,
xfs_file_release() synchronously triggers xfs_free_eofblocks() to clean
them up. This requires allocating a write transaction (xfs_trans_alloc),
which blocks indefinitely if the filesystem is currently frozen or in the
process of freezing, as it waits to acquire the superblock's write lock
via sb_start_intwrite().
As a result, a close() system call on a read-write file descriptor can
hang indefinitely in percpu_rwsem_wait() until the filesystem is thawed,
even if the file is closed by a non-writer process or after all writing
activity has already ceased.
This issue has been seen across multiple downstream environments and has a
long history of causing severe system disruption. For example:
- Downstream Red Hat Bugzilla 1474726 (dating back to 2017) details
complete system hangs during system backups when rsync and fsfreeze
are used. Even seemingly harmless read-only commands like
'cat /var/log/messages' would hang on close() in __sb_start_write
via xfs_free_eofblocks, requiring a hard reboot.
- Downstream LeApp integration test scenarios (e.g. systemd-rsync migration
checks) consistently hit this hang when trying to freeze the system.
Historically, XFS maintainers dismissed this behavior as NOTABUG, claiming
that close() is not a read-only operation and is expected to block since it
allocates write transactions. However, this behavior is highly disruptive.
User-space applications view close() as a resource reclamation system call,
not a write operation, and do not expect it to block. Hanging on close()
frequently triggers container healthcheck failures, systemd service
timeouts, and cluster failover cascades.
Additionally, no other major Linux filesystem (such as ext4 or btrfs)
synchronously allocates write transactions during close() system calls,
making this hang a highly unexpected and disruptive behavior unique to XFS.
To fix this properly and avoid any potential race conditions where a freeze
might come in immediately after a writable check (such as xfs_fs_writable),
we introduce a new transaction allocation flag: XFS_TRANS_TRYLOCK.
When XFS_TRANS_TRYLOCK is set, __xfs_trans_alloc() attempts to obtain
freeze protection using sb_start_intwrite_trylock() instead of blocking
on sb_start_intwrite(). If the trylock fails, the allocation is aborted
gracefully, freeing the newly allocated transaction handle and returning
-EAGAIN.
We then extend xfs_free_eofblocks() to accept transaction allocation flags,
and pass XFS_TRANS_TRYLOCK during xfs_file_release(). If the call returns
-EAGAIN, we cleanly clear the XFS_EOFBLOCKS_RELEASED flag so that
subsequent releases or the background blockgc garbage collector can get
another chance to clean up these speculative preallocations once the
filesystem thaws.
This approach is safe, race-free, and ensures close() never blocks on
frozen filesystems.
As requested, I have added a GPLv2-compatible license to the C reproducer
provided below, and I will be working on wiring up this reproducer into
the standard xfstests suite in the near future.
A simple GPLv2-licensed C reproducer demonstrating the hang (compile
with -pthread):
/*
* GPLv2-compatible XFS freeze close() hang reproducer.
* Copyright (c) 2026 Aditya Prakash Srivastava. All Rights Reserved.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#include <sys/ioctl.h>
#include <sys/vfs.h>
#include <linux/fs.h>
#include <libgen.h>
volatile int close_started = 0;
volatile int close_completed = 0;
void *close_thread(void *arg) {
int fd = *(int *)arg;
close_started = 1;
close(fd);
close_completed = 1;
return NULL;
}
int main(int argc, char *argv[]) {
struct statfs sfs;
statfs(argv[1], &sfs);
if (sfs.f_type != 0x58465342) return 1;
int freeze_fd = open(dirname(strdup(argv[1])), O_RDONLY);
int write_fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
char buf[65536] = {0};
for (int i = 0; i < 320; i++) write(write_fd, buf, sizeof(buf));
ioctl(freeze_fd, FIFREEZE, 0);
pthread_t thread;
pthread_create(&thread, NULL, close_thread, &write_fd);
while (!close_started) usleep(1000);
usleep(1000000); // Wait 1s
if (!close_completed) printf("SUCCESS: close() hung!\\n");
ioctl(freeze_fd, FITHAW, 0);
pthread_join(thread, NULL);
unlink(argv[1]);
return 0;
}
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205833
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1474726
Suggested-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Signed-off-by: Aditya Prakash Srivastava <aditya.ansh182@xxxxxxxxx>
---
Changes since v1:
- Drop the racy xfs_fs_writable() check from the previous version.
- Introduce the new XFS_TRANS_TRYLOCK transaction flag.
- Attempt non-blocking superblock locking via sb_start_intwrite_trylock()
when XFS_TRANS_TRYLOCK is set, returning -EAGAIN on failure.
- Update xfs_free_eofblocks() to accept flags, allowing xfs_file_release()
to handle -EAGAIN and gracefully retry/defer post-thaw.
- Add GPLv2-compatible licensing to the C reproducer.
fs/xfs/libxfs/xfs_shared.h | 3 +++
fs/xfs/xfs_bmap_util.c | 9 +++++----
fs/xfs/xfs_bmap_util.h | 2 +-
fs/xfs/xfs_file.c | 13 +++++++++++--
fs/xfs/xfs_icache.c | 2 +-
fs/xfs/xfs_inode.c | 2 +-
fs/xfs/xfs_trans.c | 18 ++++++++++++++++--
7 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index b1e0d9bc1f7d..6b2a1026ff60 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -164,6 +164,9 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp,
/* Transaction has locked the rtbitmap and rtsum inodes */
#define XFS_TRANS_RTBITMAP_LOCKED (1u << 9)
+/* Try lock filesystem superblock for freeze protection */
+#define XFS_TRANS_TRYLOCK (1u << 10)
+
/*
* Field values for xfs_trans_mod_sb.
*/
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0ab00615f1ad..56c7919bf1e5 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -574,7 +574,8 @@ xfs_can_free_eofblocks(
*/
int
xfs_free_eofblocks(
- struct xfs_inode *ip)
+ struct xfs_inode *ip,
+ uint flags)
{
struct xfs_trans *tp;
struct xfs_mount *mp = ip->i_mount;
@@ -604,9 +605,9 @@ xfs_free_eofblocks(
return 0;
}
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, flags, &tp);
if (error) {
- ASSERT(xfs_is_shutdown(mp));
+ ASSERT(error == -EAGAIN || xfs_is_shutdown(mp));
return error;
}
@@ -928,7 +929,7 @@ xfs_prepare_shift(
* into the accessible region of the file.
*/
if (xfs_can_free_eofblocks(ip)) {
- error = xfs_free_eofblocks(ip);
+ error = xfs_free_eofblocks(ip, 0);
if (error)
return error;
}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index c477b3361630..227371579205 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -66,7 +66,7 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
/* EOF block manipulation functions */
bool xfs_can_free_eofblocks(struct xfs_inode *ip);
-int xfs_free_eofblocks(struct xfs_inode *ip);
+int xfs_free_eofblocks(struct xfs_inode *ip, uint flags);
int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip,
struct xfs_swapext *sx);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 845a97c9b063..16aba428854e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1807,8 +1807,17 @@ xfs_file_release(
if (!xfs_iflags_test(ip, XFS_EOFBLOCKS_RELEASED) &&
xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) {
if (xfs_can_free_eofblocks(ip) &&
- !xfs_iflags_test_and_set(ip, XFS_EOFBLOCKS_RELEASED))
- xfs_free_eofblocks(ip);
+ !xfs_iflags_test_and_set(ip, XFS_EOFBLOCKS_RELEASED)) {
+ int error = xfs_free_eofblocks(ip, XFS_TRANS_TRYLOCK);
+
+ /*
+ * If transaction allocation fails due to a frozen/freezing
+ * filesystem, clear the released flag so that subsequent
+ * releases or background blockgc can retry post-thaw.
+ */
+ if (error == -EAGAIN)
+ xfs_iflags_clear(ip, XFS_EOFBLOCKS_RELEASED);
+ }
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
}
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 2040a9292ee6..c575b4acb24c 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1259,7 +1259,7 @@ xfs_inode_free_eofblocks(
*lockflags |= XFS_IOLOCK_EXCL;
if (xfs_can_free_eofblocks(ip))
- return xfs_free_eofblocks(ip);
+ return xfs_free_eofblocks(ip, 0);
/* inode could be preallocated */
trace_xfs_inode_free_eofblocks_invalid(ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9978ac1422fc..1c75f51af4d7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1423,7 +1423,7 @@ xfs_inactive(
* reference to the inode at this point anyways.
*/
if (xfs_can_free_eofblocks(ip))
- error = xfs_free_eofblocks(ip);
+ error = xfs_free_eofblocks(ip, 0);
goto out;
}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 148cc32449c1..b214ef5c28c1 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -218,8 +218,20 @@ __xfs_trans_alloc(
ASSERT(!(flags & XFS_TRANS_RES_FDBLKS) || xfs_has_lazysbcount(mp));
tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL);
- if (!(flags & XFS_TRANS_NO_WRITECOUNT))
- sb_start_intwrite(mp->m_super);
+ if (!(flags & XFS_TRANS_NO_WRITECOUNT)) {
+ /*
+ * If TRYLOCK is specified, attempt a non-blocking lock to
+ * avoid blocking indefinitely on frozen/freezing filesystems.
+ */
+ if (flags & XFS_TRANS_TRYLOCK) {
+ if (!sb_start_intwrite_trylock(mp->m_super)) {
+ kmem_cache_free(xfs_trans_cache, tp);
+ return ERR_PTR(-EAGAIN);
+ }
+ } else {
+ sb_start_intwrite(mp->m_super);
+ }
+ }
xfs_trans_set_context(tp);
tp->t_flags = flags;
tp->t_mountp = mp;
@@ -252,6 +264,8 @@ xfs_trans_alloc(
*/
retry:
tp = __xfs_trans_alloc(mp, flags);
+ if (IS_ERR(tp))
+ return PTR_ERR(tp);
WARN_ON(mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
error = xfs_trans_reserve(tp, resp, blocks, rtextents);
if (error == -ENOSPC && want_retry) {
--
2.47.3