Re: Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules

From: msizanoen
Date: Fri Oct 29 2021 - 20:25:52 EST


> exact command? I have not played with nftables.

sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop

> Do you have a stack
> trace of where the dst reference is getting taken?

        ip6_dst_alloc+5
        ip6_create_rt_rcu+107
        ip6_pol_route_lookup+741
        fib6_rule_action+707
        fib_rules_lookup+342
        fib6_rule_lookup+150
        nft_fib6_eval+354
        nft_do_chain+339
        nft_do_chain_inet+123
        nf_hook_slow+63
        nf_hook_slow_list+129
        ip6_sublist_rcv+606
        ipv6_list_rcv+296
        __netif_receive_skb_list_core+489
        netif_receive_skb_list_internal+433
        napi_complete_done+111
        virtnet_poll+771
        __napi_poll+42
        net_rx_action+547
        __softirqentry_text_start+208
        __irq_exit_rcu+199
        common_interrupt+131
        asm_common_interrupt+30
        native_safe_halt+11
        default_idle+10
        default_idle_call+53
        do_idle+487
        cpu_startup_entry+25
        secondary_startup_64_no_verify+194

Collected using the following bpftrace script:

kretfunc:ip6_dst_alloc { @[(uint64)retval] = kstack(); }
kfunc:ip6_dst_destroy { delete(@[(uint64)args->dst]); }

On 10/30/21 06:53, David Ahern wrote:
On 10/26/21 8:24 AM, msizanoen wrote:
The kernel leaks memory when a `fib` rule is present in ipv6 nftables
firewall rules and a suppress_prefix rule
is present in the IPv6 routing rules (used by certain tools such as
wg-quick). In such scenarios, every incoming
packet will leak an allocation in ip6_dst_cache slab cache.

After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to this commit:
    https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26


The problem with that patch is that the generic args->flags always have
FIB_LOOKUP_NOREF set[1][2] but the
ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading
to fib6_rule_suppress not
decreasing the refcount when needed. This can be fixed by exposing the
protocol-specific flags to the
protocol specific `suppress` function, and check the protocol-specific
`flags` argument for
RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when
decreasing the refcount.

How to reproduce:
- Add the following nftables rule to a prerouting chain: `meta nfproto
ipv6 fib saddr . mark . iif oif missing drop`
exact command? I have not played with nftables. Do you have a stack
trace of where the dst reference is getting taken?


- Run `sudo ip -6 rule add table main suppress_prefixlength 0`
- Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase
with every incoming ipv6 packet

Example
patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f

[1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71

[2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99