Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Liran Alon
Date: Thu Mar 15 2018 - 08:24:09 EST
----- daniel@xxxxxxxxxxxxx wrote:
> On 03/15/2018 10:21 AM, Shmulik Ladkani wrote:
> > Regarding the premise of this commit, this "reduces" the
> > ipvs/orphan/mark scrubbing in the following *non* xnet situations:
> >
> > 1. mac2vlan port xmit to other macvlan ports in Bridge Mode
> > 2. similarly for ipvlan
> > 3. veth xmit
> > 4. l2tp_eth_dev_recv
> > 5. bpf redirect/clone_redirect ingress actions
> >
> > Regarding l2tp recv, this commit seems to align the srubbing
> behavior
> > with ip tunnels (full scrub only if crossing netns, see
> ip_tunnel_rcv).
> >
> > Regarding veth xmit, it does makes sense to preserve the fields if
> not
> > crossing netns. This is also the case when one uses tc mirred.
> >
> > Regarding bpf redirect, well, it depends on the expectations of each
> bpf
> > program.
> > I'd argue that preserving the fields (at least the mark field) in
> the
> > *non* xnet makes sense and provides more information and therefore
> more
> > capabilities; Alas this might change behavior already being relied
> on.
> >
> > Maybe Daniel can comment on the matter.
>
> Overall I think it might be nice to not need scrubbing skb in such
> cases,
> although my concern would be that this has potential to break
> existing
> setups when they would expect mark being zero on other veth peer in
> any
> case since it's the behavior for a long time already. The safer
> option
> would be to have some sort of explicit opt-in e.g. on link creation to
> let
> the skb->mark pass through unscrubbed. This would definitely be a
> useful
> option e.g. when mark is set in the netns facing veth via
> clsact/egress
> on xmit and when the container is unprivileged anyway.
>
> Thanks,
> Daniel
I see your point in regards to backwards comparability.
However, not scrubbing skb when it cross netns via some kernel functions compared to
others is basically a bug which could easily break with a little bit of more refactoring.
Therefore, it seems a bit weird to me to from now on, we will force
every user on link creation to consider that once there was a bug leading
to this weird behavior on specific netdevs.
Thus, I suggest to maybe control this via a global /proc/sys/net file instead.
-Liran