Re: [PATCH v1 08/13] ceph: make ceph_start_io_write() killable
From: Viacheslav Dubeyko
Date: Thu Mar 12 2026 - 16:02:57 EST
On Thu, 2026-03-12 at 10:16 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
>
> When multiple processes write to the same file and one of them is
> blocked waiting for MDS/OSD response (e.g., during MDS failover),
> other processes block indefinitely on down_write(&inode->i_rwsem)
> in ceph_start_io_write().
>
> This causes hung task warnings:
>
> INFO: task dd:12345 blocked for more than 122 seconds.
> Call Trace:
> ceph_start_io_write+0x...
> ceph_write_iter+0x...
>
> The i_rwsem is held by a process doing fsync/writeback that is
> waiting for MDS or OSD response. Other writers queue up on the
> rwsem and block indefinitely.
>
> Fix this by using down_write_killable() instead of down_write().
> This allows blocked processes to be killed with SIGKILL, preventing
> indefinite hangs. The function now returns an error code that
> callers must check.
>
> Update ceph_write_iter() to handle the new error return from
> ceph_start_io_write().
>
> Signed-off-by: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
> ---
> fs/ceph/file.c | 9 +++++++--
> fs/ceph/io.c | 9 +++++++--
> fs/ceph/io.h | 2 +-
> 3 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 6587c2d5af1e0..01e4f31b1f2f3 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -2359,8 +2359,13 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
> retry_snap:
> if (direct_lock)
> ceph_start_io_direct(inode);
> - else
> - ceph_start_io_write(inode);
> + else {
> + err = ceph_start_io_write(inode);
> + if (err) {
> + ceph_free_cap_flush(prealloc_cf);
> + return err;
> + }
> + }
>
> if (iocb->ki_flags & IOCB_APPEND) {
> err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
> diff --git a/fs/ceph/io.c b/fs/ceph/io.c
> index c456509b31c3f..f9ac89ec1d6a1 100644
> --- a/fs/ceph/io.c
> +++ b/fs/ceph/io.c
> @@ -83,11 +83,16 @@ ceph_end_io_read(struct inode *inode)
> * Declare that a buffered write operation is about to start, and ensure
> * that we block all direct I/O.
> */
> -void
> +int
> ceph_start_io_write(struct inode *inode)
> {
> - down_write(&inode->i_rwsem);
> + int ret;
> +
> + ret = down_write_killable(&inode->i_rwsem);
> + if (ret)
> + return ret;
> ceph_block_o_direct(ceph_inode(inode), inode);
> + return 0;
> }
Which kernel version do you have? Because, we have this for v.7.0.0-rc3 [1]:
/**
* ceph_start_io_write - declare the file is being used for buffered writes
* @inode: file inode
*
* Declare that a buffered write operation is about to start, and ensure
* that we block all direct I/O.
*/
int ceph_start_io_write(struct inode *inode)
{
int err = down_write_killable(&inode->i_rwsem);
if (!err)
ceph_block_o_direct(ceph_inode(inode), inode);
return err;
}
Thanks,
Slava.
>
> /**
> diff --git a/fs/ceph/io.h b/fs/ceph/io.h
> index fa594cd77348a..94ce176df9997 100644
> --- a/fs/ceph/io.h
> +++ b/fs/ceph/io.h
> @@ -4,7 +4,7 @@
>
> void ceph_start_io_read(struct inode *inode);
> void ceph_end_io_read(struct inode *inode);
> -void ceph_start_io_write(struct inode *inode);
> +int ceph_start_io_write(struct inode *inode);
> void ceph_end_io_write(struct inode *inode);
> void ceph_start_io_direct(struct inode *inode);
> void ceph_end_io_direct(struct inode *inode);
[1] https://elixir.bootlin.com/linux/v7.0-rc3/source/fs/ceph/io.c#L110