Re: [PATCH v2] udp reuseport: fix packet of same flow hashed to different socket
From: Su Xuemin
Date: Wed Jun 08 2016 - 21:45:24 EST
On Wed 2016-06-08 at 08:43 -0700, Eric Dumazet wrote:
> I am not convinced this is the right way to fix the issue.
>
> Fact that you did not change udp4_lib_lookup2() is telling me something.
>
>
> 1) If host has 100 different IP, and 10 sockets bound to 0.0.0.0:53
> 2) If we receive datagrams from SRCIP:srcport send to IP{1..100}:53
> will all be hashed on same slot.
>
> See the hash used in soreuseport logic as an equivalent of a 4-tuple
> hash really, not a 3-tuple one.
>
This is my understanding of __udp4_lib_lookup(), see the comments below,
please kindly review it.
The problem of the current code is:
In Path 1, we pass daddr to udp_ehashfn() inside udp4_lib_lookup2().
In Path 2, we pass 0.0.0.0 to udp_ehashfn() inside udp4_lib_lookup2().
In Path 3, we pass daddr to udp_ehashfn(), even if inet->inet_rcv_saddr
is 0.0.0.0.
The hash value returned by udp_ehashfn() is used as a random seed to
select socket from the sockets of a hslot. In Path 2 and Path 3, this
hash value is different, so we will select different sockets.
Pass inet->inet_rcv_saddr to udp_ehashfn() in Path 3 will make the
logic consistent in all these three Pathes.
--------
...
if (hslot->count > 10) {
/* Xuemin:
* for udptable->hash2[], sockets bound to specific ip and
* sockets bound to 0.0.0.0 may be placed in different hslot,
* so we may have to search two hslots. */
...
/* Xuemin Path 1:
* for udptable->hash2[], firstly try to find the socket
* bound to the specific daddr. */
result = udp4_lib_lookup2(net, saddr, sport,
daddr, hnum, dif,
hslot2, slot2, skb);
if (!result) {
/* Xuemin Path 2:
* for udptable->hash2[], if we can not find the
* socket bound to the specific daddr, we then
* try to find the socket bound to 0.0.0.0. */
result = udp4_lib_lookup2(net, saddr, sport,
htonl(INADDR_ANY), hnum, dif,
hslot2, slot2, skb);
}
return result;
}
/* Xuemin:
* for udptable->hash[], all sockets bound to the same
* port(no matter they are bound to specific ip or to 0.0.0.0)
* are placed in the same hslot, so we only need to search one
* hslot. */
begin:
result = NULL;
badness = 0;
sk_for_each_rcu(sk, &hslot->head) {
score = compute_score(sk, net, saddr, hnum, sport,
daddr, dport, dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
/* Xuemin Path 3:
* when inet_sk(sk)->inet_rcv_saddr is 0.0.0.0,
* we should not pass daddr here. */
hash = udp_ehashfn(net, daddr, hnum,
saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));