Re: [PATCH net-next] net: ipvlan: forward ingress packet to slave's l2 in l3s mode

From: Paolo Abeni
Date: Wed Jun 26 2019 - 04:17:06 EST


Hi,

On Tue, 2019-06-25 at 14:42 +0800, Zhiyuan Hou wrote:
> In ipvlan l3s mode, ingress packet is switched to slave interface and
> delivers to l4 stack. This may cause two problems:
>
> 1. When slave is in an ns different from master, the behavior of stack
> in slave ns may cause confusion for users. For example, iptables, tc,
> and other l2/l3 functions are not available for ingress packet.
>
> 2. l3s mode is not used for tap device, and cannot support ipvtap. But
> in VM or container based VM cases, tap device is a very common device.
>
> In l3s mode's input nf_hook, this patch calles the skb_forward_dev() to
> forward ingress packet to slave and uses nf_conntrack_confirm() to make
> conntrack work with new mode.
>
> Signed-off-by: Zha Bin <zhabin@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Zhiyuan Hou <zhiyuan2048@xxxxxxxxxxxxxxxxx>
> ---
> drivers/net/ipvlan/ipvlan.h | 9 ++++++++-
> drivers/net/ipvlan/ipvlan_l3s.c | 16 ++++++++++++++--
> 2 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 3837c897832e..48c814e24c3f 100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -172,6 +172,14 @@ void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
> void ipvlan_link_setup(struct net_device *dev);
> int ipvlan_link_register(struct rtnl_link_ops *ops);
> #ifdef CONFIG_IPVLAN_L3S
> +
> +#include <net/netfilter/nf_conntrack_core.h>
> +
> +static inline int ipvlan_confirm_conntrack(struct sk_buff *skb)
> +{
> + return nf_conntrack_confirm(skb);
> +}
> +
> int ipvlan_l3s_register(struct ipvl_port *port);
> void ipvlan_l3s_unregister(struct ipvl_port *port);
> void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet);
> @@ -206,5 +214,4 @@ static inline bool netif_is_ipvlan_port(const struct net_device *dev)
> {
> return rcu_access_pointer(dev->rx_handler) == ipvlan_handle_frame;
> }
> -
> #endif /* __IPVLAN_H */
> diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c
> index 943d26cbf39f..ed210002f593 100644
> --- a/drivers/net/ipvlan/ipvlan_l3s.c
> +++ b/drivers/net/ipvlan/ipvlan_l3s.c
> @@ -95,14 +95,26 @@ static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
> {
> struct ipvl_addr *addr;
> unsigned int len;
> + int ret = NF_ACCEPT;
> + bool success;
>
> addr = ipvlan_skb_to_addr(skb, skb->dev);
> if (!addr)
> goto out;
>
> - skb->dev = addr->master->dev;
> len = skb->len + ETH_HLEN;
> - ipvlan_count_rx(addr->master, len, true, false);
> +
> + ret = ipvlan_confirm_conntrack(skb);
> + if (ret != NF_ACCEPT) {
> + ipvlan_count_rx(addr->master, len, false, false);
> + goto out;
> + }
> +
> + skb_push_rcsum(skb, ETH_HLEN);
> + success = dev_forward_skb(addr->master->dev, skb) == NET_RX_SUCCESS;

This looks weird to me: if I read the code correctly, the skb will
traverse twice NF_INET_LOCAL_IN, once due to the l3s hooking and
another one due to dev_forward_skb().

Also, tc ingreess, etc will run after the first traversing of
NF_INET_LOCAL_IN.

All in all I think that if full l2 processing is required, a different
mode or a different virtual device should be used.

Cheers,

Paolo