Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

From: Martin KaFai Lau
Date: Mon Jan 24 2022 - 22:01:44 EST


On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <menglong8.dong@xxxxxxxxx> wrote:
> >
> > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > Hello!
> > > >
> > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > [...]
> > > > >
> > > > > Looks like
> > > > > __sk_buff->remote_port
> > > > > bpf_sock_ops->remote_port
> > > > > sk_msg_md->remote_port
> > > > > are doing the right thing,
> > > > > but bpf_sock->dst_port is not correct?
> > > > >
> > > > > I think it's better to fix it,
> > > > > but probably need to consolidate it with
> > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > won't be correct.
> > > >
> > > > What's the meaning of 'narrow access'? Do you mean to
> > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > won't work, as it only is accessible with u32.
> > >
> > > u8 access to remote_port won't pass the verifier,
> > > but u8 access to dst_port will.
> > > Though it will return incorrect data.
> > > See how convert_ctx_accesses() handles narrow loads.
> > > I think we need to generalize it for different endian fields.
> >
> > Yeah, I understand narrower load in convert_ctx_accesses()
> > now. Seems u8 access to dst_port can't pass the verifier too,
> > which can be seen form bpf_sock_is_valid_access():
> >
> > $ switch (off) {
> > $ case offsetof(struct bpf_sock, state):
> > $ case offsetof(struct bpf_sock, family):
> > $ case offsetof(struct bpf_sock, type):
> > $ case offsetof(struct bpf_sock, protocol):
> > $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> > $ case offsetof(struct bpf_sock, src_port):
> > $ case offsetof(struct bpf_sock, rx_queue_mapping):
> > $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> > $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > $ bpf_ctx_record_field_size(info, size_default);
> > $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> > $ }
> >
> > I'm still not sure what should we do now. Should we make all
> > remote_port and dst_port narrower accessable and endianness
> > right? For example the remote_port in struct bpf_sock_ops:
> >
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > return false;
> > info->reg_type = PTR_TO_PACKET_END;
> > break;
> > + case bpf_ctx_range(struct bpf_sock_ops, remote_port):
>
> Ahh. bpf_sock_ops don't have it.
> But bpf_sk_lookup and sk_msg_md have it.
>
> bpf_sk_lookup->remote_port
> supports narrow access.
>
> When it accesses sport from bpf_sk_lookup_kern.
>
> and we have tests that do u8 access from remote_port.
> See verifier/ctx_sk_lookup.c
>
> > case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > bpf_ctx_record_field_size(info, size_default);
> > return bpf_ctx_narrow_access_ok(off, size,
> >
> > If remote_port/dst_port are made narrower accessable, the
> > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > be the port with network byte order. And the port in host byte
> > order can be get with:
> > bpf_ntohs(*((u16*)&sk->remote_port))
> > or
> > bpf_htonl(sk->remote_port)
>
> So u8, u16, u32 will work if we make them narrow-accessible, right?
>
> The summary if I understood it:
> . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> . bpf_sock->dst_port is not correct for u32,
> since it's missing bpf_ctx_range() ?
> . __sk_buff->remote_port
> bpf_sock_ops->remote_port
> sk_msg_md->remote_port
> correct for u32 access only. They don't support narrow access.
>
> but wait
> we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> How does it work then?
>
> I think we need more eyes on the problem.
> cc-ing more experts.
iiuc, I think both bpf_sk_lookup and bpf_sock allow narrow access.
bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
not ((__u8 *)&bpf_sock->dst_port)[1]. bpf_sk_lookup allows reading
a byte at [0], [1], [2], and [3].

The test_sock_fields.c currently works because it is comparing
with another __u16: "sk->dst_port == srv_sa6.sin6_port".
It should also work with bpf_ntohS() which usually is what the
userspace program expects when dealing with port instead of using bpf_ntohl()?
Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
using. Also, changing it to the upper 16 bits will break existing
bpf progs.

For narrow access with any number of bytes at any offset may be useful
for IP[6] addr. Not sure about the port though. Ideally it should only
allow sizeof(__u16) read at offset 0. However, I think at this point it makes
sense to make them consistent with how bpf_sk_lookup does it also,
i.e. allow byte [0], [1], [2], and [3] access.

would love to hear how others think about it.