[PATCH v1 08/13] ceph: make ceph_start_io_write() killable
From: Ionut Nechita (Wind River)
Date: Thu Mar 12 2026 - 04:19:28 EST
From: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
When multiple processes write to the same file and one of them is
blocked waiting for MDS/OSD response (e.g., during MDS failover),
other processes block indefinitely on down_write(&inode->i_rwsem)
in ceph_start_io_write().
This causes hung task warnings:
INFO: task dd:12345 blocked for more than 122 seconds.
Call Trace:
ceph_start_io_write+0x...
ceph_write_iter+0x...
The i_rwsem is held by a process doing fsync/writeback that is
waiting for MDS or OSD response. Other writers queue up on the
rwsem and block indefinitely.
Fix this by using down_write_killable() instead of down_write().
This allows blocked processes to be killed with SIGKILL, preventing
indefinite hangs. The function now returns an error code that
callers must check.
Update ceph_write_iter() to handle the new error return from
ceph_start_io_write().
Signed-off-by: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
---
fs/ceph/file.c | 9 +++++++--
fs/ceph/io.c | 9 +++++++--
fs/ceph/io.h | 2 +-
3 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 6587c2d5af1e0..01e4f31b1f2f3 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2359,8 +2359,13 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
retry_snap:
if (direct_lock)
ceph_start_io_direct(inode);
- else
- ceph_start_io_write(inode);
+ else {
+ err = ceph_start_io_write(inode);
+ if (err) {
+ ceph_free_cap_flush(prealloc_cf);
+ return err;
+ }
+ }
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
diff --git a/fs/ceph/io.c b/fs/ceph/io.c
index c456509b31c3f..f9ac89ec1d6a1 100644
--- a/fs/ceph/io.c
+++ b/fs/ceph/io.c
@@ -83,11 +83,16 @@ ceph_end_io_read(struct inode *inode)
* Declare that a buffered write operation is about to start, and ensure
* that we block all direct I/O.
*/
-void
+int
ceph_start_io_write(struct inode *inode)
{
- down_write(&inode->i_rwsem);
+ int ret;
+
+ ret = down_write_killable(&inode->i_rwsem);
+ if (ret)
+ return ret;
ceph_block_o_direct(ceph_inode(inode), inode);
+ return 0;
}
/**
diff --git a/fs/ceph/io.h b/fs/ceph/io.h
index fa594cd77348a..94ce176df9997 100644
--- a/fs/ceph/io.h
+++ b/fs/ceph/io.h
@@ -4,7 +4,7 @@
void ceph_start_io_read(struct inode *inode);
void ceph_end_io_read(struct inode *inode);
-void ceph_start_io_write(struct inode *inode);
+int ceph_start_io_write(struct inode *inode);
void ceph_end_io_write(struct inode *inode);
void ceph_start_io_direct(struct inode *inode);
void ceph_end_io_direct(struct inode *inode);
--
2.53.0