RE: [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT.

From: Kuniyuki Iwashima
Date: Thu Nov 19 2020 - 17:02:15 EST


From: David Laight <David.Laight@xxxxxxxxxx>
Date: Wed, 18 Nov 2020 09:18:24 +0000
> From: Kuniyuki Iwashima
> > Sent: 17 November 2020 09:40
> >
> > The SO_REUSEPORT option allows sockets to listen on the same port and to
> > accept connections evenly. However, there is a defect in the current
> > implementation. When a SYN packet is received, the connection is tied to a
> > listening socket. Accordingly, when the listener is closed, in-flight
> > requests during the three-way handshake and child sockets in the accept
> > queue are dropped even if other listeners could accept such connections.
> >
> > This situation can happen when various server management tools restart
> > server (such as nginx) processes. For instance, when we change nginx
> > configurations and restart it, it spins up new workers that respect the new
> > configuration and closes all listeners on the old workers, resulting in
> > in-flight ACK of 3WHS is responded by RST.
>
> Can't you do something to stop new connections being queued (like
> setting the 'backlog' to zero), then carry on doing accept()s
> for a guard time (or until the queue length is zero) before finally
> closing the listening socket.

Yes, but with eBPF.
There are some ideas suggested and well discussed in the thread below,
resulting in that connection draining by eBPF was merged.
https://lore.kernel.org/netdev/1443313848-751-1-git-send-email-tolga.ceylan@xxxxxxxxx/


Also, setting zero to backlog does not work well.
https://lore.kernel.org/netdev/1447262610.17135.114.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

---8<---
From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Subject: Re: [PATCH 1/1] net: Add SO_REUSEPORT_LISTEN_OFF socket option as
drain mode
Date: Wed, 11 Nov 2015 09:23:30 -0800
> Actually listen(fd, 0) is not going to work well :
>
> For request_sock that were created (by incoming SYN packet) before this
> listen(fd, 0) call, the 3rd packet (ACK coming from client) would not be
> able to create a child attached to this listener.
>
> sk_acceptq_is_full() test in tcp_v4_syn_recv_sock() would simply drop
> the thing.
---8<---