Re: [PATCH 1/2] ksmbd: rewrite stop_sessions() with restartable iteration
From: Namjae Jeon
Date: Sun Apr 26 2026 - 19:21:15 EST
On Sat, Apr 25, 2026 at 6:38 PM DaeMyung Kang <charsyam@xxxxxxxxx> wrote:
>
> stop_sessions() walks conn_list with hash_for_each() and, for every
> entry, drops conn_list_lock across the transport ->shutdown() call
> before re-acquiring the read lock to continue the loop. The hash
> walk relies on cross-iteration state (the current bucket and the
> hlist position), which is not preserved across unlock/relock: if
> another thread performs a list mutation during the unlocked window,
> the ongoing iteration becomes unreliable and can re-visit
> connections that have already been handled or skip connections that
> have not. The outer `if (!hash_empty(conn_list)) goto again;` retry
> masks the symptom in the common case but does not address the
> unsafe iteration itself.
>
> Reframe the loop so it never relies on iterator state across
> unlock/relock. Under conn_list_lock held for read, pick the first
> connection whose ->shutdown() has not yet been issued by this path,
> pin it by taking an extra reference, record that fact on the
> connection and mark it EXITING while still inside the locked walk,
> then drop the lock. Then call ->shutdown() outside the lock, drop
> the pin (freeing the connection if the handler already released its
> reference), and restart from the top.
>
> Use a new per-connection flag, conn->stop_called, as the "shutdown
> issued from stop_sessions()" marker rather than reusing the status
> state. ksmbd_conn_set_exiting() is also invoked by
> ksmbd_sessions_deregister() on sibling channels of a multichannel
> session without issuing a transport shutdown, so treating
> KSMBD_SESS_EXITING as "already handled here" would skip connections
> that still need shutdown() to wake their handler out of recv(),
> leaving the outer retry waiting indefinitely for the hash to drain.
> stop_sessions() is serialised by init_lock in
> ksmbd_conn_transport_destroy(), so writing stop_called under the
> read lock has no other writer.
>
> Set EXITING inside the locked walk so the selection, the stop_called
> marker, and the status transition all happen together, and guard
> against regressing a connection that has already advanced to
> KSMBD_SESS_RELEASING on its own (for example, if the handler exited
> its receive loop for an unrelated reason between teardown steps).
>
> When the pin drop is the last put, release the transport and pair
> ida_destroy(&target->async_ida) with the ida_init() done in
> ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its
> own does not leak the xarray backing of the embedded async_ida.
>
> The outer retry with msleep() is kept to wait for handler threads to
> reach ksmbd_conn_free() and drain the hash.
>
> Observed with an instrumented build that logs one line per visit and
> widens the unlocked window before ->shutdown() by 200 ms, under
> five concurrent cifs mounts (nosharesock, one connection each):
>
> * Current code: the same connection address is revisited many
> times during a single stop_sessions() call and ->shutdown() is
> invoked well beyond the number of live connections before the
> hash finally drains.
>
> * Rewritten code: each live connection produces exactly one
> ->shutdown() call; the function returns as soon as the hash is
> empty.
>
> Functional teardown via `ksmbd.control --shutdown` with the same
> five mounts completes cleanly on the rewritten path.
>
> Performance is observably unchanged. Tearing down N concurrent
> nosharesock cifs connections with `ksmbd.control --shutdown` +
> `rmmod ksmbd` takes essentially the same wall time before and after
> the rewrite:
>
> N before after
> 10 4.93s 5.34s
> 30 7.34s 7.03s
> 50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s)
> 100 6.98s 6.78s
> 200 6.77s 6.89s
>
> and the number of ->shutdown() calls equals the number of live
> connections on both paths when the race is not widened. The
> teardown is dominated by the msleep(100)-based outer retry waiting
> for handler threads to run ksmbd_conn_free(), not by the iteration
> itself; the restartable loop's worst-case O(N^2) visit cost is in
> the microseconds even at N=200 and sits far below the msleep(100)
> granularity.
>
> Applied alone on top of ksmbd-for-next-next, this patch does not
> introduce a new leak site. Under the same reproducer (10x
> concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the
> tree still shows the pre-existing per-connection transport leak
> count that arises when the last refcount drop lands in one of
> ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() -
> all of which end with a bare kfree() today. kmemleak backtraces
> for the unreferenced objects point into the TCP accept path
> (sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none
> involve stop_sessions(). Plugging those bare-kfree sites is the
> responsibility of the follow-up patch.
>
> Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: DaeMyung Kang <charsyam@xxxxxxxxx>
Applied it to #ksmbd-for-next-next.
Thanks!