Re: [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for socket migration.
From: Kuniyuki Iwashima
Date: Thu Nov 19 2020 - 17:14:05 EST
From: Martin KaFai Lau <kafai@xxxxxx>
Date: Wed, 18 Nov 2020 17:00:45 -0800
> On Tue, Nov 17, 2020 at 06:40:22PM +0900, Kuniyuki Iwashima wrote:
> > This patch makes it possible to select a new listener for socket migration
> > by eBPF.
> >
> > The noteworthy point is that we select a listening socket in
> > reuseport_detach_sock() and reuseport_select_sock(), but we do not have
> > struct skb in the unhash path.
> >
> > Since we cannot pass skb to the eBPF program, we run only the
> > BPF_PROG_TYPE_SK_REUSEPORT program by calling bpf_run_sk_reuseport() with
> > skb NULL. So, some fields derived from skb are also NULL in the eBPF
> > program.
> More things need to be considered here when skb is NULL.
>
> Some helpers are probably assuming skb is not NULL.
>
> Also, the sk_lookup in filter.c is actually passing a NULL skb to avoid
> doing the reuseport select.
Honestly, I have missed this point...
I wanted users to reuse the same eBPF program seamlessly, but it seems unsafe.
> > Moreover, we can cancel migration by returning SK_DROP. This feature is
> > useful when listeners have different settings at the socket API level or
> > when we want to free resources as soon as possible.
> >
> > Reviewed-by: Benjamin Herrenschmidt <benh@xxxxxxxxxx>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxxxx>
> > ---
> > net/core/filter.c | 26 +++++++++++++++++++++-----
> > net/core/sock_reuseport.c | 23 ++++++++++++++++++++---
> > net/ipv4/inet_hashtables.c | 2 +-
> > 3 files changed, 42 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 01e28f283962..ffc4591878b8 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -8914,6 +8914,22 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
> > SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF(S, NS, F, NF, \
> > BPF_FIELD_SIZEOF(NS, NF), 0)
> >
> > +#define SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF_OR_NULL(S, NS, F, NF, SIZE, OFF) \
> > + do { \
> > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), si->dst_reg, \
> > + si->src_reg, offsetof(S, F)); \
> > + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); \
> Although it may not matter much, always doing this check seems not very ideal
> considering the fast path will always have skb and only the slow
> path (accept-queue migrate) has skb is NULL. I think the req_sk usually
> has the skb also except the timer one.
Yes, but the migration happens only when/after the listener is closed, so
I think it does not occur so frequently and will not be a problem.
> First thought is to create a temp skb but it has its own issues.
> or it may actually belong to a new prog type. However, lets keep
> exploring possible options (including NULL skb).
I also thought up the two ideas, but the former will be a bit complicated.
And the latter makes users implement the new eBPF program. I did not want
users to struggle anymore, so I have selected the NULL skb. However, it is
not safe, so adding a new prog type seems to be the better way.