Re: [PATCH net-next,v5 0/9] netfilter: flowtable bridge and vlan enhancements
From: Alexander Lobakin
Date: Sun Nov 22 2020 - 05:26:50 EST
From: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
Date: Fri, 20 Nov 2020 13:49:12 +0100
> Hi,
>
> The following patchset augments the Netfilter flowtable fastpath to
> support for network topologies that combine IP forwarding, bridge and
> VLAN devices.
>
> This v5 includes updates for:
>
> - Patch #2: fix incorrect xmit type in IPv6 path, per Florian Westphal.
> - Patch #3: fix possible off by one in dev_fill_forward_path() stack logic,
> per Florian Westphal.
> - Patch #7: add a note to patch description to specify that FDB topology
> updates are not supported at this stage, per Jakub Kicinski.
>
> A typical scenario that can benefit from this infrastructure is composed
> of several VMs connected to bridge ports where the bridge master device
> 'br0' has an IP address. A DHCP server is also assumed to be running to
> provide connectivity to the VMs. The VMs reach the Internet through
> 'br0' as default gateway, which makes the packet enter the IP forwarding
> path. Then, netfilter is used to NAT the packets before they leave
> through the wan device.
>
> Something like this:
>
> fast path
> .------------------------.
> / \
> | IP forwarding |
> | / \ .
> | br0 eth0
> . / \
> -- veth1 veth2
> .
> .
> .
> eth0
> ab:cd:ef:ab:cd:ef
> VM
I'm concerned about bypassing vlan and bridge's .ndo_start_xmit() in
case of this shortcut. We'll have incomplete netdevice Tx stats for
these two, as it gets updated inside this callbacks.
> The idea is to accelerate forwarding by building a fast path that takes
> packets from the ingress path of the bridge port and place them in the
> egress path of the wan device (and vice versa). Hence, skipping the
> classic bridge and IP stack paths.
>
> This patchset is composed of:
>
> Patch #1 adds a placeholder for the hash calculation, instead of using
> the dir field.
>
> Patch #2 adds the transmit path type field to the flow tuple. Two transmit
> paths are supported so far: the neighbour and the xfrm transmit
> paths. This patch comes in preparation to add a new direct ethernet
> transmit path (see patch #7).
>
> Patch #3 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
> netdev_ops. This new function describes the list of netdevice hops
> to reach a given destination MAC address in the local network topology,
> e.g.
>
> IP forwarding
> / \
> br0 eth0
> / \
> veth1 veth2
> .
> .
> .
> eth0
> ab:cd:ef:ab:cd:ef
>
> where veth1 and veth2 are bridge ports and eth0 provides Internet
> connectivity. eth0 is the interface in the VM which is connected to
> the veth1 bridge port. Then, for packets going to br0 whose
> destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path()
> provides the following path: br0 -> veth1.
>
> Patch #4 adds .ndo_fill_forward_path for VLAN devices, which provides the next
> device hop via vlan->real_dev. This annotates the VLAN id and protocol.
> This is useful to know what VLAN headers are expected from the ingress
> device. This also provides information regarding the VLAN headers
> to be pushed in the egress path.
>
> Patch #5 adds .ndo_fill_forward_path for bridge devices, which allows to make
> lookups to the FDB to locate the next device hop (bridge port) in the
> forwarding path.
>
> Patch #6 updates the flowtable to use the dev_fill_forward_path()
> infrastructure to obtain the ingress device in the fastpath.
>
> Patch #7 updates the flowtable to use dev_fill_forward_path() to obtain the
> egress device in the forwarding path. This also adds the direct
> ethernet transmit path, which pushes the ethernet header to the
> packet and send it through dev_queue_xmit(). This patch adds
> support for the bridge, so bridge ports use this direct xmit path.
>
> Patch #8 adds ingress VLAN support (up to 2 VLAN tags, QinQ). The VLAN
> information is also provided by dev_fill_forward_path(). Store the
> VLAN id and protocol in the flow tuple for hash lookups. The VLAN
> support in the xmit path is achieved by annotating the first vlan
> device found in the xmit path and by calling dev_hard_header()
> (previous patch #7) before dev_queue_xmit().
>
> Patch #9 extends nft_flowtable.sh selftest: This is adding a test to
> cover bridge and vlan support coming in this patchset.
>
> = Performance numbers
>
> My testbed environment consists of three containers:
>
> 192.168.20.2 .20.1 .10.1 10.141.10.2
> veth0 veth0 veth1 veth0
> ns1 <---------> nsr1 <--------> ns2
> SNAT
> iperf -c iperf -s
>
> where nsr1 is used for forwarding. There is a bridge device br0 in nsr1,
> veth0 is a port of br0. SNAT is performed on the veth1 device of nsr1.
>
> - ns2 runs iperf -s
> - ns1 runs iperf -c 10.141.10.2 -n 100G
>
> My results are:
>
> - Baseline (no flowtable, classic forwarding path + netfilter): ~16 Gbit/s
> - Fastpath (with flowtable, this patchset): ~25 Gbit/s
>
> This is an improvement of ~50% compared to baseline.
>
> Please, apply. Thank you.
>
> Pablo Neira Ayuso (9):
> netfilter: flowtable: add hash offset field to tuple
> netfilter: flowtable: add xmit path types
> net: resolve forwarding path from virtual netdevice and HW destination address
> net: 8021q: resolve forwarding path for vlan devices
> bridge: resolve forwarding path for bridge devices
> netfilter: flowtable: use dev_fill_forward_path() to obtain ingress device
> netfilter: flowtable: use dev_fill_forward_path() to obtain egress device
> netfilter: flowtable: add vlan support
> selftests: netfilter: flowtable bridge and VLAN support
>
> include/linux/netdevice.h | 35 +++
> include/net/netfilter/nf_flow_table.h | 43 +++-
> net/8021q/vlan_dev.c | 15 ++
> net/bridge/br_device.c | 27 +++
> net/core/dev.c | 46 ++++
> net/netfilter/nf_flow_table_core.c | 51 +++--
> net/netfilter/nf_flow_table_ip.c | 200 ++++++++++++++----
> net/netfilter/nft_flow_offload.c | 159 +++++++++++++-
> .../selftests/netfilter/nft_flowtable.sh | 82 +++++++
> 9 files changed, 598 insertions(+), 60 deletions(-)
>
> --
> 2.20.1
Al