Re: [PATCH v3 bpf-next 00/11] Socket migration for SO_REUSEPORT.

From: Kuniyuki Iwashima
Date: Wed Apr 21 2021 - 07:31:06 EST


From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Tue, 20 Apr 2021 18:43:36 +0200
> On 4/20/21 5:41 PM, Kuniyuki Iwashima wrote:
> > The SO_REUSEPORT option allows sockets to listen on the same port and to
> > accept connections evenly. However, there is a defect in the current
> > implementation [1]. When a SYN packet is received, the connection is tied
> > to a listening socket. Accordingly, when the listener is closed, in-flight
> > requests during the three-way handshake and child sockets in the accept
> > queue are dropped even if other listeners on the same port could accept
> > such connections.
> >
> > This situation can happen when various server management tools restart
> > server (such as nginx) processes. For instance, when we change nginx
> > configurations and restart it, it spins up new workers that respect the new
> > configuration and closes all listeners on the old workers, resulting in the
> > in-flight ACK of 3WHS is responded by RST.
> >
> > The SO_REUSEPORT option is excellent to improve scalability.
>
> This was before the SYN processing was made lockless.
>
> I really wonder if we still need SO_REUSEPORT for TCP ?

I'm sorry this might be misleading. This was an old topic in v3.5. Also,
scalability or performance are not the primary reason to use SO_REUSEPORT
for now.

There are cases which need SO_REUSEPORT for other reasons.

If servers take both UDP and TCP requests (for example, proxy of QUIC and
HTTP2), it is nice to have the same eBPF mechanism to handle UDP and TCP.

Also, about reloading configurations, some applications want to keep it
simple to reload configurations by replacing processes.

Then, even with the new accept() syscall, I think there would be migration
(of queue or of children) needed. If the way was like fd passing, it might
not work when the process died in the middle of fd passing.

So, I think it is better to do migration in kernel without interaction with
the old process.

In this point, SO_REUSEPORT is good because we can bind a new process
without interaction with the old process. And with this patchset, we can
migrate requests by close()/shutdown() the old listener.


>
> Eventually a new accept() system call where different threads
> can express how they want to choose the children sockets would
> be less invasive.
>
> Instead of having many listeners, have one listener and eventually multiple
> accept queues to improve scalability of accept() phase.

It sounds interesting. Could you elaborate the idea ?

And sorry, I couldn't understand correctly what "invasive" means. Does it
mean the new accept() will have less change or more simple API or something
other ?

Also, I wonder if the new accept() has similar flexibility as eBPF does.