Re: [PATCH net v3] mptcp: fix soft lockup in mptcp_recvmsg()

From: Li Xiasong

Date: Mon Mar 30 2026 - 06:43:40 EST

Hi Matt,

On 3/27/2026 8:13 PM, Matthieu Baerts wrote:
> Hi Li,
>
> On 27/03/2026 08:55, Li Xiasong wrote:
>> syzbot reported a soft lockup in mptcp_recvmsg() [0].
>>
>> When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
>> removed from the sk_receive_queue. This causes sk_wait_data() to always
>> find available data and never perform actual waiting, leading to a soft
>> lockup.
>>
>> Fix this by adding a 'last' parameter to track the last peeked skb.
>> This allows sk_wait_data() to make informed waiting decisions and prevent
>> infinite loops when MSG_PEEK is used.
>
> Thank you for the new version!
>
> (...)
>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index cf1852b99963..60f6e6a189b7 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>
> (...)
>
>> @@ -2343,7 +2347,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
>>
>> pr_debug("block timeout %ld\n", timeo);
>> mptcp_cleanup_rbuf(msk, copied);
>> - err = sk_wait_data(sk, &timeo, NULL);
>> + err = sk_wait_data(sk, &timeo, last);
>
> sashiko is saying [1] this:
>
>> Will this cause a soft lockup if all socket buffers in the receive queue
>> have already been peeked?
>>
>> If the queue only contains already-peeked buffers, the loop in
>> __mptcp_recvmsg_mskq() skips them using a continue statement:
>>
>> if (flags & MSG_PEEK) {
>> /* skip already peeked skbs */
>> if (total_data_len + data_len <= copied_total) {
>> total_data_len += data_len;
>> continue;
>> }
>>
>> This means last is never assigned and remains NULL.
>>
>> When last is NULL, sk_wait_data() checks if the receive queue tail is
>> not equal to NULL. Since the queue still contains the unconsumed buffers,
>> this evaluates to true and sk_wait_data() returns immediately without
>> sleeping.
>>
>> Does this result in an infinite loop here when MSG_PEEK and MSG_WAITALL
>> are used together?
>
> I *think* that's a false positive. When MSG_PEEK and MSG_WAITALL are
> used together, sk_wait_data() will be call with "last" != NULL and will
> be unblocked when a new data packet (skb->len > 0) is added to the
> queue. In other words, when walking the queue in __mptcp_recvmsg_mskq,
> it should never be full of already-peeked skb. Or did I miss something?
>
> If yes, "*last = skb" could be added before the "continue".
>
> If no, this patch can be applied in 'net' directly:
>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@xxxxxxxxxx>
>

Through code analysis and testing, there is indeed the special scenario
sashiko described:

when receiving data with MSG_PEEK | MSG_WAITALL while waiting for data
and calling shutdown(sock_fd, SHUT_WR) in another thread simultaneously,
mptcp_close_wake_up will wake up sk_wait_data, but sk->sk_state remains
FIN_WAIT2, leading to a busy loop with CPU at 100%, which can further
lead to soft lockup.

Adding '*last = skb' before 'continue' can properly solve it. I'll send
v4 with this fix later.

Thanks,
Li Xiasong