Re: [RFC] EADDRINUSE from bind() on application restart after killing

From: Eric Dumazet
Date: Fri Oct 14 2022 - 12:20:31 EST


On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <pgofman@xxxxxxxxxxxxxxx> wrote:
>
> Hello Eric,
>
> our problem is actually not with the accept socket / port for which
> those timeouts apply, we don't care for that temporary port number. The
> problem is that the listen port (to which apps bind explicitly) is also
> busy until the accept socket waits through all the necessary timeouts
> and is fully closed. From my reading of TCP specs I don't understand why
> it should be this way. The TCP hazards stipulating those timeouts seem
> to apply to accept (connection) socket / port only. Shouldn't listen
> socket's port (the only one we care about) be available for bind
> immediately after the app stops listening on it (either due to closing
> the listen socket or process force kill), or maybe have some other
> timeouts not related to connected accept socket / port hazards? Or am I
> missing something why it should be the way it is done now?
>


To quote your initial message :

<quote>
We are able to avoid this error by adding SO_REUSEADDR attribute to the
socket in a hack. But this hack cannot be added to the application
process as we don't own it.
</quote>

Essentially you are complaining of the linux kernel being unable to
run a buggy application.

We are not going to change the linux kernel because you can not
fix/recompile an application.

Note that you could use LD_PRELOAD, or maybe eBPF to automatically
turn SO_REUSEADDR before bind()


> Thanks,
> Paul.
>
>
> On 9/30/22 10:16, Eric Dumazet wrote:
> > On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
> > <usama.anjum@xxxxxxxxxxxxx> wrote:
> >> Hi Eric,
> >>
> >> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
> >> of this hazard we have 60 seconds timeout in TIME_WAIT state if
> >> connection isn't closed properly. From RFC 1337:
> >>> The TIME-WAIT delay allows all old duplicate segments time
> >> enough to die in the Internet before the connection is reopened.
> >>
> >> As on localhost there is virtually no delay. I think the TIME-WAIT delay
> >> must be zero for localhost connections. I'm no expert here. On localhost
> >> there is no delay. So why should we wait for 60 seconds to mitigate a
> >> hazard which isn't there?
> > Because we do not specialize TCP stack for loopback.
> >
> > It is easy to force delays even for loopback (tc qdisc add dev lo root
> > netem ...)
> >
> > You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
> >
> > TIME_WAIT sockets are optional.
> > If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
> >
> >> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
> >> zap is required from privileged (CAP_NET_ADMIN) process. We are having
> >> hard time finding a privileged process to do this.
> > Really, we are not going to add kludges in TCP stacks because of this reason.
> >
> >> Thanks,
> >> Usama
> >>
> >>
> >> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
> >>> Hello,
> >>>
> >>> We have a set of processes which talk with each other through a local
> >>> TCP socket. If the process(es) are killed (through SIGKILL) and
> >>> restarted at once, the bind() fails with EADDRINUSE error. This error
> >>> only appears if application is restarted at once without waiting for 60
> >>> seconds or more. It seems that there is some timeout of 60 seconds for
> >>> which the previous TCP connection remains alive waiting to get closed
> >>> completely. In that duration if we try to connect again, we get the error.
> >>>
> >>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
> >>> socket in a hack. But this hack cannot be added to the application
> >>> process as we don't own it.
> >>>
> >>> I've looked at the TCP connection states after killing processes in
> >>> different ways. The TCP connection ends up in 2 different states with
> >>> timeouts:
> >>>
> >>> (1) Timeout associated with FIN_WAIT_1 state which is set through
> >>> `tcp_fin_timeout` in procfs (60 seconds by default)
> >>>
> >>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
> >>> seems like this timeout has come from RFC 1337.
> >>>
> >>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
> >>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
> >>> the RFC mentions several hazards. But we are talking about a local TCP
> >>> connection where maybe those hazards aren't applicable directly? Is it
> >>> possible to change timeout for TIME_WAIT state for only local
> >>> connections without any hazards?
> >>>
> >>> We have tested a hack where we replace timeout of TIME_WAIT state from a
> >>> value in procfs for local connections. This solves our problem and
> >>> application starts to work without any modifications to it.
> >>>
> >>> The question is that what can be the best possible solution here? Any
> >>> thoughts will be very helpful.
> >>>
> >>> Regards,
> >>>
> >> --
> >> Muhammad Usama Anjum
>
>