[RFC PATCH 0/3] Allow sk_lookup UDP return traffic to egress.

From: Tiago Lam
Date: Fri Sep 13 2024 - 05:40:45 EST


Currently, sk_lookup allows an ebpf program to run on the ingress socket
lookup path, and accept traffic not only on a range of addresses, but
also on a range of ports. At Cloudflare we use sk_lookup for two main
cases:
1. Sharing a single port between multiple services - i.e. two services
(or more) use disjoint IP ranges but share the same port;
2. Receiving traffic on all ports - i.e. a service which accepts traffic
on specific IP ranges but any port [1].

However, one main challenge we face while using sk_lookup for these use
cases is how to source return UDP traffic:
- On point 1. above, sometimes this range of addresses are not local
(i.e. there's no local routes for these in the server), which means we
need IP_TRANSPARENT set to be able to egress traffic from addresses
we've received traffic on (or simply IP_FREEBIND in the case of IPv6);
- And on point 2. above, allowing traffic to a range of ports means a
service could get traffic on multiple ports, but currently there's no
way to set the source UDP port egress traffic should be sourced from -
it's possible to receive the original destination port using the
IP_ORIGDSTADDR ancilliary message in recvmsg, but not set it in
sendmsg.

Both of these limitations can be worked around, but in a sub-optimal
way. Using IP_TRANSPARENT, for instance, requires special privileges.
And while one could use UDP connected sockets to send return traffic,
creating a connected socket for each different address a UDP traffic is
received on does have performance implications.

Given sk_lookup allows services to accept traffic on a range of
addresses or ports, it seems sensible to also allow return traffic to
proceed through as well, without needing extra configurations / set ups.

This patch set allows to do exactly this by performing a reverse socket
lookup on the egress path - where it looks to see if the egress socket
matches a socket in the attached sk_lookup ebpf program for the traffic
that's being sent. If it does, traffic is allowed to proceed.

The downsides to this is that this runs on the egress hot path, although
this work tries to minimise its impact by only performing the reverse
socket lookup when necessary. Further performance measurements are to be
taken, but we're reaching out early for feedback to see what the
technical concerns are and if we can address them.

[1] https://blog.cloudflare.com/how-we-built-spectrum/

Suggested-by: Jakub Sitnicki <jakub@xxxxxxxxxxxxxx>
Signed-off-by: Tiago Lam <tiagolam@xxxxxxxxxxxxxx>
---
Tiago Lam (3):
ipv4: Run a reverse sk_lookup on sendmsg.
ipv6: Run a reverse sk_lookup on sendmsg.
bpf: Add sk_lookup test to use ORIGDSTADDR cmsg.

include/net/ip.h | 1 +
net/ipv4/ip_sockglue.c | 11 ++++
net/ipv4/udp.c | 33 +++++++++-
net/ipv6/datagram.c | 76 ++++++++++++++++++++++
net/ipv6/udp.c | 8 ++-
tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 70 +++++++++++++-------
6 files changed, 174 insertions(+), 25 deletions(-)
---
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
change-id: 20240909-reverse-sk-lookup-f7bf36292bc4

Best regards,
--
Tiago Lam <tiagolam@xxxxxxxxxxxxxx>