Re: [PATCH 2/2] tcp: fix forever orphan socket caused by tcp_abort
From: Youngmin Nam
Date: Mon Mar 17 2025 - 00:29:14 EST
On Fri, Mar 14, 2025 at 01:24:26PM +0100, Greg KH wrote:
> On Fri, Mar 14, 2025 at 06:24:46PM +0900, Youngmin Nam wrote:
> > From: Xueming Feng <kuro@xxxxxxxx>
> >
> > commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4 upstream.
> >
> > We have some problem closing zero-window fin-wait-1 tcp sockets in our
> > environment. This patch come from the investigation.
> >
> > Previously tcp_abort only sends out reset and calls tcp_done when the
> > socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
> > purging the write queue, but not close the socket and left it to the
> > timer.
> >
> > While purging the write queue, tp->packets_out and sk->sk_write_queue
> > is cleared along the way. However tcp_retransmit_timer have early
> > return based on !tp->packets_out and tcp_probe_timer have early
> > return based on !sk->sk_write_queue.
> >
> > This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> > and socket not being killed by the timers, converting a zero-windowed
> > orphan into a forever orphan.
> >
> > This patch removes the SOCK_DEAD check in tcp_abort, making it send
> > reset to peer and close the socket accordingly. Preventing the
> > timer-less orphan from happening.
> >
> > According to Lorenzo's email in the v1 thread, the check was there to
> > prevent force-closing the same socket twice. That situation is handled
> > by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
> > already closed.
> >
> > The -ENOENT code comes from the associate patch Lorenzo made for
> > iproute2-ss; link attached below, which also conform to RFC 9293.
> >
> > At the end of the patch, tcp_write_queue_purge(sk) is removed because it
> > was already called in tcp_done_with_error().
> >
> > p.s. This is the same patch with v2. Resent due to mis-labeled "changes
> > requested" on patchwork.kernel.org.
> >
> > Link: https://protect2.fireeye.com/v1/url?k=f1caf90b-ae51376f-f1cb7244-000babda0201-1111684dae24e0cf&q=1&e=32bd2804-1687-48c6-945d-f20eded99c42&u=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fnetdev%2Fpatch%2F1450773094-7978-3-git-send-email-lorenzo%40google.com%2F
> > Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
> > Signed-off-by: Xueming Feng <kuro@xxxxxxxx>
> > Tested-by: Lorenzo Colitti <lorenzo@xxxxxxxxxx>
> > Reviewed-by: Jason Xing <kerneljasonxing@xxxxxxxxx>
> > Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> > Link: https://protect2.fireeye.com/v1/url?k=66416ec8-39daa0ac-6640e587-000babda0201-21346ca5121765eb&q=1&e=32bd2804-1687-48c6-945d-f20eded99c42&u=https%3A%2F%2Fpatch.msgid.link%2F20240826102327.1461482-1-kuro%40kuroa.me
> > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.10+
>
> Does not apply to 6.1.y or older, what did you want this applied to?
>
> thanks,
>
> greg k-h
>
Hi Greg,
Sorry about that. Let me resend these patches for 6.1 and 5.15.
As for 5.10, it seems to have more dependencies for the backport.
I think the maintainer should handle it to ensure a safe backport.