Re: [PATCH] tcp: check socket state before calling WARN_ON

From: Youngmin Nam
Date: Thu Mar 13 2025 - 22:45:57 EST

Next message: Adam Li: "Re: [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy"
Previous message: Binbin Wu: "Re: [PATCH 2/4] KVM: VMX: Move x86_ops wrappers under CONFIG_KVM_INTEL_TDX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Mar 01, 2025 at 02:37:23PM +0900, Youngmin Nam wrote:
> On Tue, Feb 25, 2025 at 12:24:47PM -0500, Neal Cardwell wrote:
> > On Mon, Feb 24, 2025 at 4:13 PM Neal Cardwell <ncardwell@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Feb 3, 2025 at 12:17 AM Youngmin Nam <youngmin.nam@xxxxxxxxxxx> wrote:
> > > >
> > > > > Hi Neal,
> > > > > Thank you for looking into this issue.
> > > > > When we first encountered this issue, we also suspected that tcp_write_queue_purge() was being called.
> > > > > We can provide any information you would like to inspect.
> > >
> > > Thanks again for raising this issue, and providing all that data!
> > >
> > > I've come up with a reproducer for this issue, and an explanation for
> > > why this has only been seen on Android so far, and a theory about a
> > > related socket leak issue, and a proposed fix for the WARN and the
> > > socket leak.
> > >
> > > Here is the scenario:
> > >
> > > + user process A has a socket in TCP_ESTABLISHED
> > >
> > > + user process A calls close(fd)
> > >
> > > + socket calls __tcp_close() and tcp_close_state() decides to enter
> > > TCP_FIN_WAIT1 and send a FIN
> > >
> > > + FIN is lost and retransmitted, making the state:
> > > ---
> > > tp->packets_out = 1
> > > tp->sacked_out = 0
> > > tp->lost_out = 1
> > > tp->retrans_out = 1
> > > ---
> > >
> > > + someone invokes "ss" to --kill the socket using the functionality in
> > > (1e64e298b8 "net: diag: Support destroying TCP sockets")
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1e64e298b8cad309091b95d8436a0255c84f54a
> > >
> > > (note: this was added for Android, so would not be surprising to have
> > > this inet_diag --kill run on Android)
> > >
> > > + the ss --kill causes a call to tcp_abort()
> > >
> > > + tcp_abort() calls tcp_write_queue_purge()
> > >
> > > + tcp_write_queue_purge() sets packets_out=0 but leaves lost_out=1,
> > > retrans_out=1
> > >
> > > + tcp_sock still exists in TCP_FIN_WAIT1 but now with an inconsistent state
> > >
> > > + ACK arrives and causes a WARN_ON from tcp_verify_left_out():
> > >
> > > #define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)
> > >
> > > because the state has:
> > >
> > > ---
> > > tcp_left_out(tp) = sacked_out + lost_out = 1
> > > tp->packets_out = 0
> > > ---
> > >
> > > because the state is:
> > >
> > > ---
> > > tp->packets_out = 0
> > > tp->sacked_out = 0
> > > tp->lost_out = 1
> > > tp->retrans_out = 1
> > > ---
> > >
> > > I guess perhaps one fix would be to just have tcp_write_queue_purge()
> > > zero out those other fields:
> > >
> > > ---
> > > tp->sacked_out = 0
> > > tp->lost_out = 0
> > > tp->retrans_out = 0
> > > ---
> > >
> > > However, there is a related and worse problem. Because this killed
> > > socket has tp->packets_out, the next time the RTO timer fires,
> > > tcp_retransmit_timer() notices !tp->packets_out is true, so it short
> > > circuits and returns without setting another RTO timer or checking to
> > > see if the socket should be deleted. So the tcp_sock is now sitting in
> > > memory with no timer set to delete it. So we could leak a socket this
> > > way. So AFAICT to fix this socket leak problem, perhaps we want a
> > > patch like the following (not tested yet), so that we delete all
> > > killed sockets immediately, whether they are SOCK_DEAD (orphans for
> > > which the user already called close() or not) :
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 28cf19317b6c2..a266078b8ec8c 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -5563,15 +5563,12 @@ int tcp_abort(struct sock *sk, int err)
> > > local_bh_disable();
> > > bh_lock_sock(sk);
> > >
> > > - if (!sock_flag(sk, SOCK_DEAD)) {
> > > - if (tcp_need_reset(sk->sk_state))
> > > - tcp_send_active_reset(sk, GFP_ATOMIC);
> > > - tcp_done_with_error(sk, err);
> > > - }
> > > + if (tcp_need_reset(sk->sk_state))
> > > + tcp_send_active_reset(sk, GFP_ATOMIC);
> > > + tcp_done_with_error(sk, err);
> > >
> > > bh_unlock_sock(sk);
> > > local_bh_enable();
> > > - tcp_write_queue_purge(sk);
> > > release_sock(sk);
> > > return 0;
> > > }
> > > ---
> >
> > Actually, it seems like a similar fix was already merged into Linux v6.11:
> >
> > bac76cf89816b tcp: fix forever orphan socket caused by tcp_abort
> >
> > Details below.
> >
> > Youngmin, does your kernel have this bac76cf89816b fix? If not, can
> > you please cherry-pick this fix and retest?
> >
> > Thanks!
> > neal
>
> Hi Neal.
>
> Thank you for your effort in debugging this issue with me.
> I also appreciate your detailed explanation and for finding the patch related to the issue.
>
> Our kernel(an Android kernel based on 6.6 LTS) does not have the patch you mentioned.(bac76cf89816b)
>
> I'll let you know the test results after applying the patch.
>
> Thank you.
>

Hi Neal.

There were confilcts when I cherry-picked the patch
"bac76cf89816b tcp: fix forever orphan socket caused by tcp_abort"
and it also has a dependency.

I believe we need a total of two patches,
which exist in the latest upstream for 6.6 LTS and earlier LTS trees.

5ce4645c23cf tcp: fix races in tcp_abort()
bac76cf89816 tcp: fix forever orphan socket caused by tcp_abort

So, I uploaded CLs for this to android kernel gerrit.

https://android-review.googlesource.com/c/kernel/common/+/3530435 BACKPORT: tcp: fix races in tcp_abort()
https://android-review.googlesource.com/c/kernel/common/+/3530436 BACKPORT: tcp: fix forever orphan socket caused by tcp_abort

After applying these patches, we conducted a total of three rounds of testing with 200 devices,
and all tests passed.

Thanks.

Next message: Adam Li: "Re: [PATCH v2 3/3] sched/fair: Update comments regarding last and skip buddy"
Previous message: Binbin Wu: "Re: [PATCH 2/4] KVM: VMX: Move x86_ops wrappers under CONFIG_KVM_INTEL_TDX"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]