Re: [PATCH net-next] net: ipvlan: forward ingress packet to slave's l2 in l3s mode

From: Zhiyuan Hou
Date: Fri Jun 28 2019 - 04:59:59 EST

å 2019/6/26 äå4:16, Paolo Abeni åé:

On Tue, 2019-06-25 at 14:42 +0800, Zhiyuan Hou wrote:
In ipvlan l3s mode, ingress packet is switched to slave interface and
delivers to l4 stack. This may cause two problems:

1. When slave is in an ns different from master, the behavior of stack
in slave ns may cause confusion for users. For example, iptables, tc,
and other l2/l3 functions are not available for ingress packet.

2. l3s mode is not used for tap device, and cannot support ipvtap. But
in VM or container based VM cases, tap device is a very common device.

In l3s mode's input nf_hook, this patch calles the skb_forward_dev() to
forward ingress packet to slave and uses nf_conntrack_confirm() to make
conntrack work with new mode.

Signed-off-by: Zha Bin <zhabin@xxxxxxxxxxxxxxxxx>
Signed-off-by: Zhiyuan Hou <zhiyuan2048@xxxxxxxxxxxxxxxxx>
drivers/net/ipvlan/ipvlan.h | 9 ++++++++-
drivers/net/ipvlan/ipvlan_l3s.c | 16 ++++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 3837c897832e..48c814e24c3f 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -172,6 +172,14 @@ void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
void ipvlan_link_setup(struct net_device *dev);
int ipvlan_link_register(struct rtnl_link_ops *ops);
+#include <net/netfilter/nf_conntrack_core.h>
+static inline int ipvlan_confirm_conntrack(struct sk_buff *skb)
+ return nf_conntrack_confirm(skb);
int ipvlan_l3s_register(struct ipvl_port *port);
void ipvlan_l3s_unregister(struct ipvl_port *port);
void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet);
@@ -206,5 +214,4 @@ static inline bool netif_is_ipvlan_port(const struct net_device *dev)
return rcu_access_pointer(dev->rx_handler) == ipvlan_handle_frame;
#endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c
index 943d26cbf39f..ed210002f593 100644
--- a/drivers/net/ipvlan/ipvlan_l3s.c
+++ b/drivers/net/ipvlan/ipvlan_l3s.c
@@ -95,14 +95,26 @@ static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
struct ipvl_addr *addr;
unsigned int len;
+ int ret = NF_ACCEPT;
+ bool success;
addr = ipvlan_skb_to_addr(skb, skb->dev);
if (!addr)
goto out;
- skb->dev = addr->master->dev;
len = skb->len + ETH_HLEN;
- ipvlan_count_rx(addr->master, len, true, false);
+ ret = ipvlan_confirm_conntrack(skb);
+ if (ret != NF_ACCEPT) {
+ ipvlan_count_rx(addr->master, len, false, false);
+ goto out;
+ }
+ skb_push_rcsum(skb, ETH_HLEN);
+ success = dev_forward_skb(addr->master->dev, skb) == NET_RX_SUCCESS;
This looks weird to me: if I read the code correctly, the skb will
traverse twice NF_INET_LOCAL_IN, once due to the l3s hooking and
another one due to dev_forward_skb().

Also, tc ingreess, etc will run after the first traversing of

Yes, but the skb's device has been modified from the master to slave. In most use cases of

ipvlan, the master device and slave device are in different namespace (ns), so the second

triggered LOCAL_IN is completely isolated from the first triggered LOCAL_IN.

When the master device and slave device are in the same ns, the behavior of this patch is

similar to that of L2 over L3 tunnel (forwarding from L3 to L2 device).

Since the device has been modified, the second triggered tc-ingress is thus different.

All in all I think that if full l2 processing is required, a different
mode or a different virtual device should be used.

We can implement it in a new mode, but such a way is similar to the current ipvlan l3s mode.

Also, ipvlan l3s mode has two problems described in patch's commit log. I think that a more

appropriate solution is to modify ipvlan l3s.