Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
From: Kuniyuki Iwashima
Date: Thu May 25 2023 - 22:50:06 EST
From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
Date: Thu, 25 May 2023 18:43:17 -0700
> From: Martin KaFai Lau <martin.lau@xxxxxxxxx>
> Date: Thu, 25 May 2023 16:42:46 -0700
> > On 5/25/23 1:19 AM, Lorenz Bauer wrote:
> > > diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
> > > index 56f1286583d3..3ba4dc2703da 100644
> > > --- a/include/net/inet6_hashtables.h
> > > +++ b/include/net/inet6_hashtables.h
> > > @@ -48,6 +48,13 @@ struct sock *__inet6_lookup_established(struct net *net,
> > > const u16 hnum, const int dif,
> > > const int sdif);
> > >
> > > +struct sock *inet6_lookup_reuseport(struct net *net, struct sock *sk,
> > > + struct sk_buff *skb, int doff,
> > > + const struct in6_addr *saddr,
> > > + __be16 sport,
> > > + const struct in6_addr *daddr,
> > > + unsigned short hnum);
> > > +
> > > struct sock *inet6_lookup_listener(struct net *net,
> > > struct inet_hashinfo *hashinfo,
> > > struct sk_buff *skb, int doff,
> > > @@ -85,14 +92,33 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
> > > int iif, int sdif,
> > > bool *refcounted)
> > > {
> > > - struct sock *sk = skb_steal_sock(skb, refcounted);
> > > -
> > > + bool prefetched;
> > > + struct sock *sk = skb_steal_sock(skb, refcounted, &prefetched);
> > > + struct net *net = dev_net(skb_dst(skb)->dev);
> > > + const struct ipv6hdr *ip6h = ipv6_hdr(skb);
> > > +
> > > + if (prefetched) {
> > > + struct sock *reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
> >
> > If sk is TCP_ESTABLISHED, I suspect sk->sk_reuseport is 1 (from sk_clone)?
>
> Exactly, it will cause null-ptr-deref in reuseport_select_sock().
Sorry, this doesn't occur. reuseport_select_sock() has null check.
> We may want to use rcu_access_pointer(sk->sk_reuseport_cb) in
> each lookup_reuseport() instead of adding sk_state check ?
And if someone has a weird program that creates multiple listeners and
disable SO_REUSEPORT for a listener that hits first in lhash2, checking
sk_reuseport_cb might not work ? I hope no one does such though, checking
sk_reuseport and sk_state could be better.
>
>
> >
> > If it is, it should still work other than an extra inet6_ehashfn. Does it worth
> > an extra sk->sk_state check or it is overkill?
> >
> >
> > > + &ip6h->saddr, sport,
> > > + &ip6h->daddr, ntohs(dport));