Re: [PATCH v4 bpf-next 00/11] Socket migration for SO_REUSEPORT.
From: Martin KaFai Lau
Date: Wed May 05 2021 - 02:54:48 EST
On Thu, Apr 29, 2021 at 12:16:09PM +0900, Kuniyuki Iwashima wrote:
[ ... ]
> > > > It may be but perhaps its more flexible? It gives the new server the
> > > > chance to re-use the existing listen fds, close, drain and/or start new
> > > > ones. It also addresses the non-REUSEPORT case where you can't bind right
> > > > away.
> > > If the flexibility is really worth the complexity, we do not care about it.
> > > But, SO_REUSEPORT can give enough flexibility we want.
> > >
> > > With socket migration, there is no need to reuse listener (fd passing),
> > > drain children (incoming connections are automatically migrated if there is
> > > already another listener bind()ed), and of course another listener can
> > > close itself and migrated children.
> > >
> > > If two different approaches resolves the same issue and one does not need
> > > complexity in userspace, we select the simpler one.
> >
> > Kernel bloat and complexity is _not_ the simplest choice.
> >
> > Touching a complex part of TCP stack is quite risky.
>
> Yes, we understand that is not a simple decision and your concern. So many
> reviews are needed to see if our approach is really risky or not.
If fd passing is sufficient for a set of use cases, it is great.
However, it does not work well for everyone. We are not saying
the SO_REUSEPORT(+ optional bpf) is better in all cases also.
After SO_REUSEPORT was added, some people had moved from fd-passing
to SO_REUSEPORT instead and have one bpf policy to select for both
TCP and UDP sk.
Since SO_REUSEPORT was first added, there has been multiple contributions
from different people and companies. For example, first adding bpf
support to UDP, then to TCP, then a much more flexible way to select sk
from reuseport_array, and then sock_map/sock_hash support. That is another
perspective showing that people find it useful. Each of the contributions
changed the kernel code also for practical use cases.
This set is an extension/improvement to address a lacking in SO_REUSEPORT
when some of the sk is closed. Patch 2 to 4 are the prep work
in sock_reuseport.c and they have the most changes in this set.
Patch 5 to 7 are the changes in tcp. The code has been structured
to be as isolated as possible. It will be most useful to at least
review and getting feedback in this part. The remaining is bpf
related.