Re: [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected socket

From: Eric Dumazet
Date: Mon Sep 23 2024 - 05:20:14 EST


On Mon, Sep 23, 2024 at 10:40 AM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Eric, sorry for the late response.
>
> On 2024/9/13 19:49, Eric Dumazet wrote:
> > On Fri, Sep 13, 2024 at 12:09 PM Philo Lu <lulie@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> This RFC patch introduces 4-tuple hash for connected udp sockets, to
> >> make udp lookup faster. It is a tentative proposal and any comment is
> >> welcome.
> >>
> >> Currently, the udp_table has two hash table, the port hash and portaddr
> >> hash. But for UDP server, all sockets have the same local port and addr,
> >> so they are all on the same hash slot within a reuseport group. And the
> >> target sock is selected by scoring.
> >>
> >> In some applications, the UDP server uses connect() for each incoming
> >> client, and then the socket (fd) is used exclusively by the client. In
> >> such scenarios, current scoring method can be ineffcient with a large
> >> number of connections, resulting in high softirq overhead.
> >>
> >> To solve the problem, a 4-tuple hash list is added to udp_table, and is
> >> updated when calling connect(). Then __udp4_lib_lookup() firstly
> >> searches the 4-tuple hash list, and return directly if success. A new
> >> sockopt UDP_HASH4 is added to enable it. So the usage is:
> >> 1. socket()
> >> 2. bind()
> >> 3. setsockopt(UDP_HASH4)
> >> 4. connect()
> >>
> >> AFAICT the patch (if useful) can be further improved by:
> >> (a) Support disable with sockopt UDP_HASH4. Now it cannot be disabled
> >> once turned on until the socket closed.
> >> (b) Better interact with hash2/reuseport. Now hash4 hardly affects other
> >> mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
> >> unnecessary.
> >> (c) Support early demux and ipv6.
> >>
> >> Signed-off-by: Philo Lu <lulie@xxxxxxxxxxxxxxxxx>
> >
> > Adding a 4-tuple hash for UDP has been discussed in the past.
> >
> > Main issue is that this is adding one cache line miss per incoming packet.
> >
>
> Thanks to Dust's idea, we can create a new field for hslot2 (or create a
> new struct for hslot2), indicating whether there are connected sockets
> in this hslot (i.e., local port and local address), and run hash4 lookup
> only when it's true. Then there would be no cache line miss.
>
> The detailed patch is attached below.
>
> > Most heavy duty UDP servers (DNS, QUIC), use non connected sockets,
> > because having one million UDP sockets has huge kernel memory cost,
> > not counting poor cache locality.
>
> Some of our applications do use connected UDP sockets (~10,000 conns),
> and will get significant benefits from hash4. We use connect() to
> separate receiving sockets and listening ones, and then it's easier to
> manage them (just like TCP), especially during live-upgrading, such as
> nginx reload. Besides, I believe hash4 is harmless to those servers
> without connected sockets.
>
> Suggestions are always welcome, and I'll keep improving this patch.
>
> Thanks.
>
> ---
> include/net/udp.h | 3 +++
> net/ipv4/udp.c | 17 ++++++++++++-----
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/udp.h b/include/net/udp.h
> index a05d79d35fbba..bec04c0e753d0 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -54,11 +54,14 @@ struct udp_skb_cb {
> *
> * @head: head of list of sockets
> * @count: number of sockets in 'head' list
> + * @hash4_cnt: number of sockets in 'hash4' table of the same (local
> port, local address),
> + * Only used by hash2.
> * @lock: spinlock protecting changes to head/count
> */
> struct udp_hslot {
> struct hlist_head head;
> int count;
> + u32 hash4_cnt;
> spinlock_t lock;
> } __attribute__((aligned(2 * sizeof(long))));

This would double the size of this structure (from 16 to 32 bytes)

Perhaps add another 'struct udp_hslot_main' for the relevant hash table,
and keep the smaller 'struct udp_hslot' for the two others.

Current cumulative size of the hash tables is 2 MB.

Alternatively we could move the locks out of the structure, this is
not used in the fast path.