Re: [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT.
From: Kuniyuki Iwashima
Date: Thu Nov 19 2020 - 17:05:44 EST
From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Date: Wed, 18 Nov 2020 17:25:44 +0100
> On 11/17/20 10:40 AM, Kuniyuki Iwashima wrote:
> > The SO_REUSEPORT option allows sockets to listen on the same port and to
> > accept connections evenly. However, there is a defect in the current
> > implementation. When a SYN packet is received, the connection is tied to a
> > listening socket. Accordingly, when the listener is closed, in-flight
> > requests during the three-way handshake and child sockets in the accept
> > queue are dropped even if other listeners could accept such connections.
> >
> > This situation can happen when various server management tools restart
> > server (such as nginx) processes. For instance, when we change nginx
> > configurations and restart it, it spins up new workers that respect the new
> > configuration and closes all listeners on the old workers, resulting in
> > in-flight ACK of 3WHS is responded by RST.
> >
>
> I know some programs are simply removing a listener from the group,
> so that they no longer handle new SYN packets,
> and wait until all timers or 3WHS have completed before closing them.
>
> They pass fd of newly accepted children to more recent programs using af_unix fd passing,
> while in this draining mode.
Just out of curiosity, can I know the software for more study?
> Quite frankly, mixing eBPF in the picture is distracting.
I agree.
Also, I think eBPF itself is not always necessary in many cases and want
to make user programs simpler with this patchset.
The SO_REUSEPORT implementation is excellent to improve the scalability. On
the other hand, as a trade-off, users have to know deeply how the kernel
handles SYN packets and to implement connection draining by eBPF.
> It seems you want some way to transfer request sockets (and/or not yet accepted established ones)
> from fd1 to fd2, isn't it something that should be discussed independently ?
I understand that you are asking that I should discuss the issue and how to
transfer sockets independently. Please correct me if I have misunderstood
your question.
The kernel handles 3WHS and users cannot know its existence (without eBPF).
Many users believe SO_REUSEPORT should make it possible to distribute all
connections across available listeners ideally, but actually, there are
possibly some connections aborted silently. Some user may think that if the
kernel selected other listeners, the connections would not be dropped.
The root cause is within the kernel, so the issue should be addressed in
the kernel space and should not be visible to userspace. In order not to
make users bother with implementing new some stuff, I want to fix the root
cause by transferring sockets automatically so that users need not take
care of kernel implementation and connection draining.
Moreover, if possible, I did not want to mix eBPF with the issue. But there
may be some cases that different applications listen on the same port and
eBPF routes packets to each by some rules. In such cases, redistributing
sockets without user intention will break the application. This patchset
will work in many cases, but to care such cases, I added the eBPF part.