On 10/25/24 05:50, Philo Lu wrote:
On 2024/10/24 23:01, Paolo Abeni wrote:
On 10/18/24 13:45, Philo Lu wrote:
[...]
+/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
+static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
+{
+ struct udp_hslot *hslot4, *nhslot4;
+
+ hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+ nhslot4 = udp_hashslot4(udptable, newhash4);
+ udp_sk(sk)->udp_lrpa_hash = newhash4;
+
+ if (hslot4 != nhslot4) {
+ spin_lock_bh(&hslot4->lock);
+ hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+ hslot4->count--;
+ spin_unlock_bh(&hslot4->lock);
+
+ synchronize_rcu();
This deserve a comment explaining why it's needed. I had to dig in past
revision to understand it.
Got it. And a short explanation here (see [1] for detail):
Here, we move a node from a hlist to another new one, i.e., update
node->next from the old hlist to the new hlist. For readers traversing
the old hlist, if we update node->next just when readers move onto the
moved node, then the readers also move to the new hlist. This is unexpected.
Reader(lookup) Writer(rehash)
----------------- ---------------
1. rcu_read_lock()
2. pos = sk;
3. hlist_del_init_rcu(sk, old_slot)
4. hlist_add_head_rcu(sk, new_slot)
5. pos = pos->next; <=
6. rcu_read_unlock()
[1]
https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@xxxxxxxxxxxxxxxxx/
Thanks. AFAICS the problem that such thing could cause is a lookup
failure for a socket positioned later in the same chain when a previous
entry is moved on a different slot during a concurrent lookup.
I think that could be solved the same way TCP is handling such scenario:
using hlist_null RCU list for the hash4 bucket, checking that a failed
lookup ends in the same bucket where it started and eventually
reiterating from the original bucket.
Have a look at __inet_lookup_established() for a more descriptive
reference, especially:
https://elixir.bootlin.com/linux/v6.12-rc4/source/net/ipv4/inet_hashtables.c#L528
...+
+
+/* call with sock lock */
+static void udp4_hash4(struct sock *sk)
+{
+ struct udp_hslot *hslot, *hslot2, *hslot4;
+ struct net *net = sock_net(sk);
+ struct udp_table *udptable;
+ unsigned int hash;
+
+ if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
+ return;
+
+ hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
+ inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
+
+ udptable = net->ipv4.udp_table;
+ if (udp_hashed4(sk)) {
+ udp4_rehash4(udptable, sk, hash);
It's unclear to me how we can enter this branch. Also it's unclear why
here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
accounting can't be placed in udp4_rehash4()?
It's possible that a connected udp socket _re-connect_ to another remote
address. Then, because the local address is not changed, hash2 and its
hash4_cnt keep unchanged. But rehash4 need to be done.
I'll also add a comment here.
Right, UDP socket could actually connect() successfully twice in a row
without a disconnect in between...
I almost missed the point that the ipv6 implementation is planned to
land afterwards.
I'm sorry, but I think that would be problematic - i.e. if ipv4 support
will land in 6.13, but ipv6 will not make it - due to time constraints -
we will have (at least a release with inconsistent behavior between ipv4
and ipv6. I think it will be better bundle such changes together.