Re: [PATCH net] tcp: fix forever orphan socket caused by tcp_abort

From: Eric Dumazet
Date: Thu Aug 01 2024 - 09:15:37 EST


On Thu, Aug 1, 2024 at 1:17 PM Xueming Feng <kuro@xxxxxxxx> wrote:
>
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
>
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD aka. orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
>
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
>
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers. Converting a zero-windowed
> orphan to a forever orphan.
>
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
>
> Fixes: e05836ac07c7 ("tcp: purge write queue upon aborting the connection")
> Fixes: bffd168c3fc5 ("tcp: clear tp->packets_out when purging write queue")
> Signed-off-by: Xueming Feng <kuro@xxxxxxxx>

This seems legit, but are you sure these two blamed commits added this bug ?

Even before them, we should have called tcp_done() right away, instead
of waiting for a (possibly long) timer to complete the job.

This might be important when killing millions of sockets on a busy server.

CC Lorenzo

Lorenzo, do you recall why your patch was testing the SOCK_DEAD flag ?

Thanks.