Re: [PATCH net v3] net: mptcp: fix unreleased socket in accept queue

From: Menglong Dong
Date: Thu Sep 08 2022 - 22:25:48 EST


On Thu, Sep 8, 2022 at 10:45 PM Paolo Abeni <pabeni@xxxxxxxxxx> wrote:
>
> On Thu, 2022-09-08 at 15:56 +0200, Matthieu Baerts wrote:
> > Hi Menglong,
> >
> > On 07/09/2022 13:11, menglong8.dong@xxxxxxxxx wrote:
> > > From: Menglong Dong <imagedong@xxxxxxxxxxx>
> > >
> > > The mptcp socket and its subflow sockets in accept queue can't be
> > > released after the process exit.
> > >
> > > While the release of a mptcp socket in listening state, the
> > > corresponding tcp socket will be released too. Meanwhile, the tcp
> > > socket in the unaccept queue will be released too. However, only init
> > > subflow is in the unaccept queue, and the joined subflow is not in the
> > > unaccept queue, which makes the joined subflow won't be released, and
> > > therefore the corresponding unaccepted mptcp socket will not be released
> > > to.
> >
> > Thank you for the v3.
> >
> > Unfortunately, our CI found a possible recursive locking:
> >
> > > - KVM Validation: debug:
> > > - Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌:
> > > - Task: https://cirrus-ci.com/task/5418283233968128
> > > - Summary: https://api.cirrus-ci.com/v1/artifact/task/5418283233968128/summary/summary.txt
> >
> > https://lore.kernel.org/mptcp/4e6d3d9e-1f1a-23ae-cb56-2d4f043f17ae@xxxxxxxxx/T/#u
> >
> > Do you mind looking at it please?
>
> Ah, that is actually a false positive, but we must silence it. The main
> point is that the lock_sock() in mptcp_close() rightfully lacks the
> _nested annotation.
>
> Instead of adding such annotation only for this call site, which would
> be both ugly and dangerous, I suggest to factor_out from mptcp_close()
> all the code the run under the socket lock, say in:
>
> bool __mptcp_close(struct sock *sk, long timeout)
> // return true if the caller need to cancel the mptcp worker
> // (outside the socket lock)
>
> and then in mptcp_subflow_queue_clean():
>
> sock_hold(sk);
>
> slow = lock_sock_fast_nested(sk);
> next = msk->dl_next;
> msk->first = NULL;
> msk->dl_next = NULL;
> do_cancel_work = __mptcp_close(sk, 0);
> unlock_sock_fast(sk, slow);
>
> if (do_cancel_work)
> mptcp_cancel_work(sk);
> sock_put(sk);
>
> All the above could require 2 different patches, 1 to factor-out the
> helper, and 1 to actually implement the fix.
>

Thanks for your explanation! As Matthieu said, I'll send the next
version to the MPTCP mailing list only.

Thanks!
Menglong Dong

> Cheers,
>
> Paolo
>