Kernel leaks memory in ip6_dst_cache when suppress_prefix is present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables rules
From: msizanoen
Date: Tue Oct 26 2021 - 10:37:02 EST
The kernel leaks memory when a `fib` rule is present in ipv6 nftables firewall rules and a suppress_prefix rule
is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming
packet will leak an allocation in ip6_dst_cache slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to this commit:
https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26
The problem with that patch is that the generic args->flags always have FIB_LOOKUP_NOREF set[1][2] but the
ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading to fib6_rule_suppress not
decreasing the refcount when needed. This can be fixed by exposing the protocol-specific flags to the
protocol specific `suppress` function, and check the protocol-specific `flags` argument for
RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount.
How to reproduce:
- Add the following nftables rule to a prerouting chain: `meta nfproto ipv6 fib saddr . mark . iif oif missing drop`
- Run `sudo ip -6 rule add table main suppress_prefixlength 0`
- Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase with every incoming ipv6 packet
Example patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f
[1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
[2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99