Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket

From: Philo Lu
Date: Thu Oct 24 2024 - 23:50:58 EST

Next message: Hongling Zeng: "[PATCH] amdgpu/fence: replace call_rcu by kfree_rcu for simple kmem_cache_free callback"
Previous message: Takashi Sakamoto: "Re: firewire-ohci: device rediscovery hardlock regression"
In reply to: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Next in thread: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2024/10/24 23:01, Paolo Abeni wrote:

On 10/18/24 13:45, Philo Lu wrote:
[...]

+/* In hash4, rehash can also happen in connect(), where hash4_cnt keeps unchanged. */
+static void udp4_rehash4(struct udp_table *udptable, struct sock *sk, u16 newhash4)
+{
+ struct udp_hslot *hslot4, *nhslot4;
+
+ hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+ nhslot4 = udp_hashslot4(udptable, newhash4);
+ udp_sk(sk)->udp_lrpa_hash = newhash4;
+
+ if (hslot4 != nhslot4) {
+ spin_lock_bh(&hslot4->lock);
+ hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+ hslot4->count--;
+ spin_unlock_bh(&hslot4->lock);
+
+ synchronize_rcu();

This deserve a comment explaining why it's needed. I had to dig in past
revision to understand it.

Got it. And a short explanation here (see [1] for detail):

Here, we move a node from a hlist to another new one, i.e., update node->next from the old hlist to the new hlist. For readers traversing the old hlist, if we update node->next just when readers move onto the moved node, then the readers also move to the new hlist. This is unexpected.

Reader(lookup) Writer(rehash)
----------------- ---------------
1. rcu_read_lock()
2. pos = sk;
3. hlist_del_init_rcu(sk, old_slot)
4. hlist_add_head_rcu(sk, new_slot)
5. pos = pos->next; <=
6. rcu_read_unlock()

[1]
https://lore.kernel.org/all/0fb425e0-5482-4cdf-9dc1-3906751f8f81@xxxxxxxxxxxxxxxxx/

+
+ spin_lock_bh(&nhslot4->lock);
+ hlist_add_head_rcu(&udp_sk(sk)->udp_lrpa_node, &nhslot4->head);
+ nhslot4->count++;
+ spin_unlock_bh(&nhslot4->lock);
+ }
+}
+
+static void udp4_unhash4(struct udp_table *udptable, struct sock *sk)
+{
+ struct udp_hslot *hslot2, *hslot4;
+
+ if (udp_hashed4(sk)) {
+ hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
+ hslot4 = udp_hashslot4(udptable, udp_sk(sk)->udp_lrpa_hash);
+
+ spin_lock(&hslot4->lock);
+ hlist_del_init_rcu(&udp_sk(sk)->udp_lrpa_node);
+ hslot4->count--;
+ spin_unlock(&hslot4->lock);
+
+ spin_lock(&hslot2->lock);
+ udp_hash4_dec(hslot2);
+ spin_unlock(&hslot2->lock);
+ }
+}
+
+/* call with sock lock */
+static void udp4_hash4(struct sock *sk)
+{
+ struct udp_hslot *hslot, *hslot2, *hslot4;
+ struct net *net = sock_net(sk);
+ struct udp_table *udptable;
+ unsigned int hash;
+
+ if (sk_unhashed(sk) || inet_sk(sk)->inet_rcv_saddr == htonl(INADDR_ANY))
+ return;
+
+ hash = udp_ehashfn(net, inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num,
+ inet_sk(sk)->inet_daddr, inet_sk(sk)->inet_dport);
+
+ udptable = net->ipv4.udp_table;
+ if (udp_hashed4(sk)) {
+ udp4_rehash4(udptable, sk, hash);

It's unclear to me how we can enter this branch. Also it's unclear why
here you don't need to call udp_hash4_inc()udp_hash4_dec, too. Why such
accounting can't be placed in udp4_rehash4()?

It's possible that a connected udp socket _re-connect_ to another remote address. Then, because the local address is not changed, hash2 and its hash4_cnt keep unchanged. But rehash4 need to be done.

I'll also add a comment here.

[...]

@@ -2031,6 +2180,19 @@ void udp_lib_rehash(struct sock *sk, u16 newhash)
spin_unlock(&nhslot2->lock);
}
+ if (udp_hashed4(sk)) {
+ udp4_rehash4(udptable, sk, newhash4);
+
+ if (hslot2 != nhslot2) {
+ spin_lock(&hslot2->lock);
+ udp_hash4_dec(hslot2);
+ spin_unlock(&hslot2->lock);
+
+ spin_lock(&nhslot2->lock);
+ udp_hash4_inc(nhslot2);
+ spin_unlock(&nhslot2->lock);
+ }
+ }
spin_unlock_bh(&hslot->lock);

The udp4_rehash4() call above is in atomic context and could end-up
calling synchronize_rcu() which is a blocking function. You must avoid that.

I see, synchronize_rcu() cannot be used with spinlock. However, I don't have a good idea to solve it by now. Do you have any thoughts or suggestions?

Cheers,

Paolo

Thanks for your reviewing, Paolo. I'll address all your concerns in the next version.

--
Philo

Next message: Hongling Zeng: "[PATCH] amdgpu/fence: replace call_rcu by kfree_rcu for simple kmem_cache_free callback"
Previous message: Takashi Sakamoto: "Re: firewire-ohci: device rediscovery hardlock regression"
In reply to: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Next in thread: Paolo Abeni: "Re: [PATCH v5 net-next 3/3] ipv4/udp: Add 4-tuple hash for connected socket"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]