Re: [EXTERNAL] [PATCH v3 05/11] ceph: add client reset state machine and session teardown

From: Viacheslav Dubeyko

Date: Wed May 06 2026 - 17:29:45 EST


On Wed, 2026-05-06 at 14:39 +0300, Alex Markuze wrote:
> Hi Slava,
>
> Thanks for the thorough review, several good points here.
>
> -EIO mapping for blocked callers
>
> The design intent is that internal work-function errors e.g.,
> -ENOMEM from kcalloc, or a transient encoding failure, should not leak
> to unrelated callers such as open() or flock(). These callers did not
> trigger the reset and have no way to
> act on "reset ran out of memory." The detailed error is preserved in
> debugfs status and tracepoints for the operator who triggered the
> reset.
>
> That said, I agree that -EIO is broad. The challenge is: what would
> be more useful to the caller? The caller's only real action is "retry
> later" regardless of whether the reset failed due to -ENOMEM or
> -ETIMEDOUT internally. If you have a
> specific error code in mind that would be more informative without
> leaking internal details, I'm open to it.
>

It's hard to advise something useful here. But if the caller's only real action
is "retry later", then, maybe, -EAGAIN could be used here?

> msleep() for close grace period
>
> I share your discomfort with msleep() in kernel code. The difficulty
> is that there is no completion event for "the REQUEST_CLOSE message
> has been transmitted on the wire." The messenger queues the message
> and returns immediately.
> The grace period is purely best-effort. The MDS uses
> session_autoclose as a fallback if it never receives the close.
>
> What event would you suggest waiting on here? One option is to wait
> for the session state to transition; the MDS sends a SESSION_CLOSE
> response, but that reintroduces the stalemate problem.
> If the MDS is stuck, we'd wait forever for something that will never
> come, which is exactly what the reset is trying to break. I'm open to
> alternatives if you have a pattern in mind.

I see the point. Yes, it's complicated of suggesting something more useful from
my side.

>
> out_sessions skipping ceph_mdsc_reset_complete()
>
> Yes, this is intentional. The out_sessions path is reached only when
> st->shutdown is true, meaning ceph_mdsc_destroy() has already taken
> ownership of the final state transition. destroy() sets phase to IDLE,
> sets last_errno to -ESHUTDOWN,
> and wakes blocked waiters itself. If the work function also called
> reset_complete(), it would race with destroy() and potentially
> overwrite the shutdown state. The comment on the shutdown check tries
> to explain this but perhaps it could be
> clearer. Would adding a comment at the out_sessions label help?
>

I am slightly lost the context here. :) But, I believe that adding the comment
could makes the situation better.

Thanks,
Slava.