[PATCH] mptcp: serialize subflow->closing with RX path

From: Kalpan Jani

Date: Thu May 07 2026 - 03:28:42 EST


There is a race between mptcp_data_ready() (RX path) and
mptcp_close_ssk() (teardown path) when accessing subflow->closing.

Currently, mptcp_data_ready() checks subflow->closing before acquiring
mptcp_data_lock(), while mptcp_close_ssk() may concurrently set
subflow->closing and purge backlog entries. This creates a classic
time-of-check vs time-of-use (TOCTOU) race:

CPU A (close path) CPU B (RX path)
---------------------- -------------------------
set closing = 1
read closing == 0
purge backlog
enqueue skb to backlog

As a result, skb entries referencing the subflow socket (ssk) may be
enqueued after the subflow is marked closing and scheduled for cleanup.
This can lead to:

- WARN in inet_sock_destruct() due to non-zero sk_rmem_alloc
- potential use-after-free via stale skb->sk references

Fix this by serializing both the closing check and backlog enqueue
under mptcp_data_lock(). This ensures that subflow->closing state and
backlog operations are observed atomically, preventing new skb from
being enqueued once teardown begins.

Also protect backlog cleanup in mptcp_close_ssk() with the same lock
to guarantee mutual exclusion with the RX path.

This restores proper synchronization between RX and teardown paths
and prevents stale skb references to closing subflows.

Signed-off-by: Kalpan Jani <kalpan.jani@xxxxxxxxxxxxxxxxxx>
---
net/mptcp/protocol.c | 31 ++++++++++++++++++++++++++++---
1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 718e910ff..295f8e1c0 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -910,14 +910,34 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
struct mptcp_sock *msk = mptcp_sk(sk);

+ /*
+ * The close path can set subflow->closing while we are racing
+ * from BH context here. The old check was done before taking
+ * mptcp_data_lock(), leaving a TOCTOU window:
+ *
+ * CPU A: close path sets closing = 1 and purges backlog
+ * CPU B: already observed closing == 0 and later enqueues skb
+ *
+ * That skb keeps skb->sk == ssk and can later trigger:
+ * - WARN in inet_sock_destruct() (ssk->sk_rmem_alloc != 0)
+ * - UAF in backlog purge via stale skb->sk
+ */
+
/* The peer can send data while we are shutting down this
* subflow at subflow destruction time, but we must avoid enqueuing
* more data to the msk receive queue
*/
- if (unlikely(subflow->closing))
- return;

mptcp_data_lock(sk);
+
+ /* Serialize closing check with backlog enqueue */
+ if (unlikely(subflow->closing)) {
+ mptcp_data_unlock(sk);
+ return;
+ }
+
mptcp_rcv_rtt_update(msk, subflow);
if (!sock_owned_by_user(sk)) {
/* Wake-up the reader only for in-sequence data */
@@ -2653,9 +2673,12 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
if (sk->sk_state == TCP_ESTABLISHED)
mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);

- /* Remove any reference from the backlog to this ssk; backlog skbs consume
+ /* Remove any reference from the backlog to this ssk.
+ * Serialize cleanup with RX-side enqueue using mptcp_data_lock().
+ * Backlog skbs consume
* space in the msk receive queue, no need to touch sk->sk_rmem_alloc
*/
+ mptcp_data_lock(sk);
list_for_each_entry(skb, &msk->backlog_list, list) {
if (skb->sk != ssk)
continue;
@@ -2663,6 +2686,8 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
skb->sk = NULL;
}
+ mptcp_data_unlock(sk);
+

/* subflow aborted before reaching the fully_established status
* attempt the creation of the next subflow
--
2.43.0