Re: [PATCH net-next] net: dsa: sja1105: disable rxvlan offload for the DSA master

From: Florian Fainelli
Date: Sun May 17 2020 - 17:34:56 EST




On 5/12/2020 4:49 PM, Vladimir Oltean wrote:
From: Vladimir Oltean <vladimir.oltean@xxxxxxx>

On sja1105 operating in best_effort_vlan_filtering mode (when the TPID
of the DSA tags is 0x8100), it can be seen that __netif_receive_skb_core
calls __vlan_hwaccel_clear_tag right before passing the skb to the DSA
packet_type handler.

This means that the tagger does not see the VLAN tag in the skb, nor in
the skb meta data.

The patch that started zeroing the skb VLAN tag is:

commit d4b812dea4a236f729526facf97df1a9d18e191c
Author: Eric Dumazet <edumazet@xxxxxxxxxx>
Date: Thu Jul 18 07:19:26 2013 -0700

vlan: mask vlan prio bits

In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651
0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213
0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson <sesse@xxxxxxxxxx>
Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>

The patch doesn't say why "we are not really ready to properly cope with
this right now", and hence why the best solution is to remove the VLAN
tag from skb's that don't have a local VLAN sub-interface interested in
them. And I have no idea either.

But the above patch has a loophole: if the VLAN tag is not
hw-accelerated, it isn't removed from the skb if there is no VLAN
sub-interface interested in it (our case). So we are hooking into the
.ndo_fix_features callback of the DSA master and clearing the rxvlan
offload feature, so the DSA tagger will always see the VLAN as part of
the skb data. This is symmetrical with the ETH_P_DSA_8021Q case and does
not need special treatment in the tagger.

If there was an API by which the dsa tag_8021q module would declare its
interest in servicing VLANs 1024-3071, such that the packets wouldn't be
classified as PACKET_OTHERHOST, and if that API wasn't as tightly
integrated with the 8021q module as vlan_find_dev/vlan_group_set_device
are, I would be interested in using it, but so far I couldn't find it.
With this patch, even though the frames still are PACKET_OTHERHOST, at
least the VLAN tag reaches far enough that the DSA packet_type handler
sees and consumes it.

The approach seems fine to me, I have to admit I am equally confused about Eric's patch other than it being a stop gap measure for a problem that was observed.

Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>

In case we have a DSA master that does support NETIF_F_HW_VLAN_CTAG_RX though we should probably ensure that we install a VLAN filter there too, which we do not do currently.
--
Florian