Re: [PATCH v5 bpf-next 01/11] net: Introduce net.ipv4.tcp_migrate_req.

From: Kuniyuki Iwashima
Date: Sat May 15 2021 - 00:01:45 EST


From: Martin KaFai Lau <kafai@xxxxxx>
Date: Fri, 14 May 2021 17:47:20 -0700
> On Mon, May 10, 2021 at 12:44:23PM +0900, Kuniyuki Iwashima wrote:
> > This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this
> > option is enabled or eBPF program is attached, we will be able to migrate
> > child sockets from a listener to another in the same reuseport group after
> > close() or shutdown() syscalls.
> >
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx>
> > Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx>
> > ---
> > Documentation/networking/ip-sysctl.rst | 20 ++++++++++++++++++++
> > include/net/netns/ipv4.h | 1 +
> > net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
> > 3 files changed, 30 insertions(+)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index c2ecc9894fd0..8e92f9b28aad 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -732,6 +732,26 @@ tcp_syncookies - INTEGER
> > network connections you can set this knob to 2 to enable
> > unconditionally generation of syncookies.
> >
> > +tcp_migrate_req - INTEGER
> > + The incoming connection is tied to a specific listening socket when
> > + the initial SYN packet is received during the three-way handshake.
> > + When a listener is closed, in-flight request sockets during the
> > + handshake and established sockets in the accept queue are aborted.
> > +
> > + If the listener has SO_REUSEPORT enabled, other listeners on the
> > + same port should have been able to accept such connections. This
> > + option makes it possible to migrate such child sockets to another
> > + listener after close() or shutdown().
> > +
> > + Default: 0
> > +
> > + Note that the source and destination listeners MUST have the same
> > + settings at the socket API level. If different applications listen
> It is a bit confusing on what "source and destination listeners" and
> "same settings at the socket API level" mean.
>
> Does it mean to say a bpf prog should usually be used to define the policy
> to pick an alive listener. If bpf prog is absence, the kernel will
> randomly pick an alive listener only if this sysctl is enabled?

Yes.

If there are two listeners having different setsockopt() settings and no
ebpf prog is attached, randam pick may crash applications.

Let's say, the migration happens from listener A to B, and only B has
TCP_SAVE_SYN enabled. Then B cannot read SYN from some requests migrated
from A.

I've written this in commit log in v2, but somehow dropped in v3...
https://lore.kernel.org/netdev/20201207132456.65472-7-kuniyu@xxxxxxxxxxxx/

I will change the description more specific.


>
> Others lgtm.
>
> Acked-by: Martin KaFai Lau <kafai@xxxxxx>

Thank you!


>
> > + on the same port, disable this option or attach the
> > + BPF_SK_REUSEPORT_SELECT_OR_MIGRATE type of eBPF program to select
> > + the correct socket by bpf_sk_select_reuseport() or to cancel
> > + migration by returning SK_DROP.
> > +