Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket

From: Philo Lu
Date: Fri Oct 25 2024 - 21:45:36 EST

Next message: Rosen Penev: "Re: [PATCHv4 net-next] net: dsa: use ethtool string helpers"
Previous message: Ian Rogers: "[PATCH v1] perf build: Make libunwind opt-in rather than opt-out"
In reply to: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024/10/25 17:02, Paolo Abeni wrote:

On 10/25/24 05:50, Philo Lu wrote:

On 2024/10/24 23:01, Paolo Abeni wrote:

On 10/18/24 13:45, Philo Lu wrote:
[...]

+/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
+static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
+{
+ struct udp_hslot *hslot4, *nhslot4;
+
+ hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+ nhslot4 = udp_hashslot4(udptable, newhash4);
+ udp_sk(sk)->udp_lrpa_hash = newhash4;
+
+ if (hslot4 != nhslot4) {
+ spin_lock_bh(&hslot4->lock);
+ hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+ hslot4->count--;
+ spin_unlock_bh(&hslot4->lock);
+
+ synchronize_rcu();

This deserve a comment explaining why it's needed. I had to dig in past
revision to understand it.

Got it. And a short explanation here (see [1] for detail):

Here, we move a node from a hlist to another new one, i.e., update
node->next from the old hlist to the new hlist. For readers traversing
the old hlist, if we update node->next just when readers move onto the
moved node, then the readers also move to the new hlist. This is unexpected.

Reader(lookup) Writer(rehash)
----------------- ---------------
1. rcu_read_lock()
2. pos = sk;
3. hlist_del_init_rcu(sk, old_slot)
4. hlist_add_head_rcu(sk, new_slot)
5. pos = pos->next; <=
6. rcu_read_unlock()

[1]
https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@xxxxxxxxxxxxxxxxx/

Thanks. AFAICS the problem that such thing could cause is a lookup
failure for a socket positioned later in the same chain when a previous
entry is moved on a different slot during a concurrent lookup.

Yes, you're right.

I think that could be solved the same way TCP is handling such scenario:
using hlist_null RCU list for the hash4 bucket, checking that a failed
lookup ends in the same bucket where it started and eventually
reiterating from the original bucket.

Have a look at __inet_lookup_established() for a more descriptive
reference, especially:

https://elixir.bootlin.com/linux/v6.12-rc4/source/net/ipv4/inet_hashtables.c#L528

Thank you! I'll try it in the next version.

+

...

+
+/* call with sock lock */
+static void udp4_hash4(struct sock *sk)
+{
+ struct udp_hslot *hslot, *hslot2, *hslot4;
+ struct net *net = sock_net(sk);
+ struct udp_table *udptable;
+ unsigned int hash;
+
+ if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
+ return;
+
+ hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
+ inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
+
+ udptable = net->ipv4.udp_table;
+ if (udp_hashed4(sk)) {
+ udp4_rehash4(udptable, sk, hash);

It's unclear to me how we can enter this branch. Also it's unclear why
here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
accounting can't be placed in udp4_rehash4()?

It's possible that a connected udp socket _re-connect_ to another remote
address. Then, because the local address is not changed, hash2 and its
hash4_cnt keep unchanged. But rehash4 need to be done.
I'll also add a comment here.

Right, UDP socket could actually connect() successfully twice in a row
without a disconnect in between...

I almost missed the point that the ipv6 implementation is planned to
land afterwards.

I'm sorry, but I think that would be problematic - i.e. if ipv4 support
will land in 6.13, but ipv6 will not make it - due to time constraints -
we will have (at least a release with inconsistent behavior between ipv4
and ipv6. I think it will be better bundle such changes together.

No problem. I can add ipv6 support in the next version too.

Thanks.
--
Philo

Next message: Rosen Penev: "Re: [PATCHv4 net-next] net: dsa: use ethtool string helpers"
Previous message: Ian Rogers: "[PATCH v1] perf build: Make libunwind opt-in rather than opt-out"
In reply to: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]