Re: [PATCH] netfilter: conntrack: fix clash resolution in nat
From: Pablo Neira Ayuso
Date: Thu Jun 29 2017 - 12:45:51 EST
Hi,
On Wed, Jun 14, 2017 at 04:11:23PM +0800, Haishuang Yan wrote:
> In our openstack environment, slow dns lookup for hostname when
> parallel dns requests for IPv4 and IPv6 addresses from VM, the
> second IPv6 request(AAAA record) is dropped on its way in compute
> node.
>
> We found many similar related links:
> https://bbs.archlinux.org/viewtopic.php?id=75770
> http://www.spinics.net/lists/netfilter-devel/msg15860.html
> https://www.spinics.net/lists/netfilter-devel/msg37565.html
>
> After the commit 71d8c47fc653 ("netfilter: conntrack: introduce clash
> resolution on insertion race") can fix packet drop, the second IPv6
> request packet would be forwarded by compute node, but the udp source
> port is bogus, start from 1024:
> 14:52:10.868485 IP 10.136.5.132.55785 > 10.136.5.1.domain: 3957+ A?
> www.baidu.com. (31)
> 14:52:10.868544 IP 10.136.5.132.1024 > 10.136.5.1.domain: 13826+ AAAA?
> www.baidu.com. (31)
>
> And after the commit 590b52e10d41 ("netfilter: conntrack: skip clash
> resolution if nat is in place") , it exclude nat situation, but the
> packet drops issue come back again.
>
> This patch revert the last commit. If l4proto allow clash but the tuple is
> used by the conntrack that wins race, the original tuple can be reused.
>
> Fixes: 590b52e10d41 ("netfilter: conntrack: skip clash resolution if 25
> nat is in place")
> Signed-off-by: Haishuang Yan <yanhaishuang@xxxxxxxxxxxxxxxxxxxx>
> ---
> net/netfilter/nf_conntrack_core.c | 1 -
> net/netfilter/nf_nat_core.c | 4 +++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index e847dba..7e16518 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -699,7 +699,6 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb,
>
> l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
> if (l4proto->allow_clash &&
> - ((ct->status & IPS_NAT_DONE_MASK) == 0) &&
> !nf_ct_is_dying(ct) &&
> atomic_inc_not_zero(&ct->ct_general.use)) {
> enum ip_conntrack_info oldinfo;
> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
> index 6c72922..9edfca2 100644
> --- a/net/netfilter/nf_nat_core.c
> +++ b/net/netfilter/nf_nat_core.c
> @@ -328,6 +328,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> const struct nf_conntrack_zone *zone;
> const struct nf_nat_l3proto *l3proto;
> const struct nf_nat_l4proto *l4proto;
> + const struct nf_conntrack_l4proto *ct_l4proto;
> struct net *net = nf_ct_net(ct);
>
> zone = nf_ct_zone(ct);
> @@ -336,6 +337,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> l3proto = __nf_nat_l3proto_find(orig_tuple->src.l3num);
> l4proto = __nf_nat_l4proto_find(orig_tuple->src.l3num,
> orig_tuple->dst.protonum);
> + ct_l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
>
> /* 1) If this srcip/proto/src-proto-part is currently mapped,
> * and that same mapping gives a unique tuple within the given
> @@ -349,7 +351,7 @@ static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
> !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) {
> /* try the original tuple first */
> if (in_range(l3proto, l4proto, orig_tuple, range)) {
> - if (!nf_nat_used_tuple(orig_tuple, ct)) {
> + if (!nf_nat_used_tuple(orig_tuple, ct) || ct_l4proto->allow_clash) {
Hm, this is defeating clash resolution logic entirely. I mean, with
this this would not call nf_nat_l4proto_unique_tuple() so we get a
unique tuple.
My concern with this is that, although this is fixing the issue for
you, I suspect this is breaking SNAT/masquerade in case we get two
clients in the LAN if both flow have a tuple that matches this:
*, srcport, dst-IP, dst-port
The problem that we have with this is a race condition, the problem is
that the second packet doesn't find a confirmed conntrack in the
hashes, so it gets its own one and NAT makes sure such tuple is
unique.
So far, the only way I see to fix this is to postpone packet mangling
that NAT performs after the clash resolution code in
nf_conntrack_confirm(). But that changes the behaviour in a way that
would break matching rules that assume the existing behaviour, that
is, nf_nat_setup() mangles the packet.
> *tuple = *orig_tuple;
> goto out;
> }
> --
> 1.8.3.1
>
>
>