Re: [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT.

From: Kuniyuki Iwashima
Date: Thu Nov 19 2020 - 17:18:03 EST


From: Martin KaFai Lau <kafai@xxxxxx>
Date: Wed, 18 Nov 2020 17:49:13 -0800
> On Tue, Nov 17, 2020 at 06:40:15PM +0900, Kuniyuki Iwashima wrote:
> > The SO_REUSEPORT option allows sockets to listen on the same port and to
> > accept connections evenly. However, there is a defect in the current
> > implementation. When a SYN packet is received, the connection is tied to a
> > listening socket. Accordingly, when the listener is closed, in-flight
> > requests during the three-way handshake and child sockets in the accept
> > queue are dropped even if other listeners could accept such connections.
> >
> > This situation can happen when various server management tools restart
> > server (such as nginx) processes. For instance, when we change nginx
> > configurations and restart it, it spins up new workers that respect the new
> > configuration and closes all listeners on the old workers, resulting in
> > in-flight ACK of 3WHS is responded by RST.
> >
> > As a workaround for this issue, we can do connection draining by eBPF:
> >
> > 1. Before closing a listener, stop routing SYN packets to it.
> > 2. Wait enough time for requests to complete 3WHS.
> > 3. Accept connections until EAGAIN, then close the listener.
> >
> > Although this approach seems to work well, EAGAIN has nothing to do with
> > how many requests are still during 3WHS. Thus, we have to know the number
> It sounds like the application can already drain the established socket
> by accept()? To solve the problem that you have,
> does it mean migrating req_sk (the in-progress 3WHS) is enough?

Ideally, the application needs to drain only the accepted sockets because
3WHS and tying a connection to a listener are just kernel behaviour. Also,
there are some cases where we want to apply new configurations as soon as
possible such as replacing TLS certificates.

It is possible to drain the established sockets by accept(), but the
sockets in the accept queue have not started application sessions yet. So,
if we do not drain such sockets (or if the kernel happened to select
another listener), we can (could) apply the new settings much earlier.

Moreover, the established sockets may start long-standing connections so
that we cannot complete draining for a long time and may have to
force-close them (and they would have longer lifetime if they are migrated
to a new listener).


> Applications can already use the bpf prog to do (1) and divert
> the SYN to the newly started process.
>
> If the application cares about service disruption,
> it usually needs to drain the fd(s) that it already has and
> finishes serving the pending request (e.g. https) on them anyway.
> The time taking to finish those could already be longer than it takes
> to drain the accept queue or finish off the 3WHS in reasonable time.
> or the application that you have does not need to drain the fd(s)
> it already has and it can close them immediately?

In the point of view of service disruption, I agree with you.

However, I think that there are some situations where we want to apply new
configurations rather than to drain sockets with old configurations and
that if the kernel migrates sockets automatically, we can simplify user
programs.