Re: [PATCH net] net/smc: avoid recursive sk_callback_lock in listen data_ready

From: Dust Li

Date: Fri Jun 19 2026 - 02:35:47 EST


On 2026-06-17 23:28:55, Runyu Xiao wrote:
>smc_listen() installs smc_clcsock_data_ready() as the underlying TCP
>listen socket's sk_data_ready callback. smc_clcsock_data_ready() then
>immediately takes sk_callback_lock before looking up the SMC listener and
>queuing smc_tcp_listen_work().
>
>That is unsafe once the TCP listen socket is leaving TCP_LISTEN. The TCP
>close/flush path can run the installed sk_data_ready callback with
>sk_callback_lock already held, so entering smc_clcsock_data_ready() again
>tries to take the same rwlock recursively in the same thread. The nvmet
>TCP listener had to make the same state check before taking
>sk_callback_lock for this reason.
>
>This issue was found by our static analysis tool and then manually
>reviewed against the current tree.
>
>The grounded PoC kept the SMC listen callback installation path:
>
> smc_listen()
> smc_clcsock_replace_cb()
> sk_data_ready = smc_clcsock_data_ready()
>
>It then modeled the close/flush carrier that invokes the installed
>sk_data_ready callback while sk_callback_lock is already held. Lockdep
>reported the same-thread recursive acquisition:
>
> WARNING: possible recursive locking detected
> smc_clcsock_data_ready+0xa/0x4d [vuln_msv]
> smc_close_flush_work+0x1f/0x30 [vuln_msv]
> *** DEADLOCK ***
>
>Return before taking sk_callback_lock when the underlying TCP socket is no
>longer in TCP_LISTEN. In that state there is no listen accept work to
>queue for SMC, and avoiding the callback lock mirrors the fix used by the
>TCP nvmet listener.

Hi Runyu,

I noticed the lockdep splat comes from your own kernel module
([vuln_msv]) that models the condition, rather than from a real
TCP code path.

Could you point me to the specific mainline TCP code path that calls
sk_data_ready() while holding sk_callback_lock? If such a path
exists, I'm happy to take this patch. But if this is based solely on
static analysis without a confirmed real call chain, I'd prefer to
focus our review bandwidth on issues that have demonstrated impact.

Thanks,
Dust