Re: [PATCH v1 05/13] ceph: add timeout protection to ceph_lock_wait_for_completion()

From: Viacheslav Dubeyko

Date: Thu Mar 12 2026 - 15:40:19 EST

On Thu, 2026-03-12 at 10:16 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
>
> When a file lock operation is interrupted and an unlock request is
> sent to cancel it, ceph_lock_wait_for_completion() waits indefinitely
> for r_safe_completion using wait_for_completion_killable().
> If the MDS becomes unreachable after the unlock request is sent,
> this wait will block indefinitely, causing hung task warnings:
> INFO: task flock:12345 blocked for more than 122 seconds.
> Call Trace:
> wait_for_completion_killable+0x...
> ceph_lock_wait_for_completion+0x...
> ceph_flock+0x...
> This is similar to the issue fixed in ceph_mdsc_sync() where
> indefinite waits on r_safe_completion can hang when MDS is
> unavailable.
> Fix this by using wait_for_completion_killable_timeout() with
> mount_timeout instead of the indefinite wait. On timeout, return
> -ETIMEDOUT to the caller. The lock state remains consistent because:
> 1. If the unlock succeeded on MDS, the lock is released
> 2. If the unlock didn't reach MDS, the original lock request
> was already aborted (CEPH_MDS_R_ABORTED set), so MDS will
> clean it up on reconnect
> This follows the same timeout pattern used throughout the ceph
> client for MDS operations.
> Signed-off-by: Ionut Nechita <ionut.nechita@xxxxxxxxxxxxx>
> ---
> fs/ceph/locks.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
> index ebf4ac0055ddc..55dd99460b81a 100644
> --- a/fs/ceph/locks.c
> +++ b/fs/ceph/locks.c
> @@ -160,6 +160,8 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
> struct ceph_mds_request *req)
> {
> struct ceph_client *cl = mdsc->fsc->client;
> + struct ceph_options *opts = mdsc->fsc->client->options;
> + unsigned long timeout = ceph_timeout_jiffies(opts->mount_timeout);

The opts->mount_timeout could be configured unreasonably. Should we do something
about it?

> struct ceph_mds_request *intr_req;
> struct inode *inode = req->r_inode;
> int err, lock_type;
> @@ -221,7 +223,17 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
> if (err && err != -ERESTARTSYS)
> return err;
>
> - wait_for_completion_killable(&req->r_safe_completion);
> + err = wait_for_completion_killable_timeout(&req->r_safe_completion,
> + timeout);
> + if (err == -ERESTARTSYS) {

Interesting... You didn't check this in other patches. Why? :)

> + /* Interrupted again, just return the error */
> + return err;
> + }
> + if (err == 0) {
> + pr_warn_client(cl, "lock request tid %llu safe completion timed out\n",
> + req->r_tid);

The same concern about sending warning into system log here.

> + return -ETIMEDOUT;
> + }

Maybe, some style cleanup here:

if (err == -ERESTARTSYS) {
<logic_1>
} else if (err == 0) {
<logic_2>
}

Thanks,
Slava.

> return 0;
> }
>