Re: [PATCH net] net: dsa: Treat VLAN ID 0 as PVID untagged

From: Florian Fainelli
Date: Wed Feb 12 2020 - 17:55:06 EST


On 2/12/20 2:38 PM, Vladimir Oltean wrote:
> On Wed, 12 Feb 2020 at 23:28, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>
>> On 2/12/20 12:47 PM, Vladimir Oltean wrote:
>>> Hi Florian,
>>>
>>> On Wed, 12 Feb 2020 at 22:06, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>>>>
>>>> VLAN ID 0 is special by all kinds and is really meant to be the default
>>>> ingress and egress untagged VLAN. We were not configuring it that way
>>>> and so we would be ingress untagged but egress tagged.
>>>>
>>>> When our devices are interfaced with other link partners such as switch
>>>> devices, the results would be entirely equipment dependent. Some
>>>> switches are completely fine with accepting an egress tagged frame with
>>>> VLAN ID 0 and would send their responses untagged, so everything works,
>>>> but other devices are not so tolerant and would typically reject a VLAN
>>>> ID 0 tagged frame.
>>>
>>> Are you sure that it's not in fact those devices that are not doing
>>> what they're supposed to? VID 0 should be sent as tagged and no port
>>> membership checks should be enforced on it.
>>
>> Where everything works what I see is the following:
>>
>> - Linux on egress sends an untagged frame (as captured by tcpdump) but
>> the VLAN entry for VID 0 makes it egress tagged and the machine on the
>> other sees it as such as well
>
> So the operating system is sending untagged traffic, it gets
> pvid-tagged by the hardware on the CPU port and is sent as
> egress-tagged on the front panel.
> Odd, but ok, not illegal, I suppose. The odd part is caused by having
> the vid 0 as pvid. Otherwise, having vid 0 as egress-tagged is not in
> itself a problem, since the assumption is that the only way a frame
> would get to the switch with VID 0 was if that VID was already in the
> tag.
> If anything, changing the pvid to something that is egress-untagged
> will give you some nice throughput boost, save you 4 bytes on the wire
> per frame.
>
>> - the response from that machine is also ingress tagged as captured from
>> the DSA master network device
>>
>> what I do not have visibility into are systems where this does not work
>> but will try to request that.
>
> Well, we can talk until the cows come home, but until the drop reason
> on those devices is clear, I would refrain from drawing any
> conclusion.
>
>> Breaking users is obviously bad which
>> prompted me for doing this specification violating frame. I am not sure
>> whether DSA standalone ports qualify as managed ports or not, sounds
>> like no given we have not added support for doing much UC/MC filtering
>> unlike what NICs do.
>>
>>>
>>>>
>>>> Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation")
>>>> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
>>>> ---
>>>> Hi all,
>>>>
>>>> After looking at all DSA drivers and how they implement port_vlan_add()
>>>> I think this is the right change to do, but would appreciate if you
>>>> could test this on your respective platforms to ensure this is not
>>>> problematic.
>>>
>>> I'm pretty sure this is problematic, for the simple reason that with
>>> this change, DSA is insisting that the default PVID is 0, contrary to
>>> the bridge core which insists it is 1. And some switches, like the
>>> Microchip Ocelot/Felix, don't support more than 1 egress-untagged
>>> VLAN, so adding one of the VIDs 0 or 1 will fail (I don't know the
>>> exact order off-hand). See 1c44ce560b4d ("net: mscc: ocelot: fix
>>> vlan_filtering when enslaving to bridge before link is up") for more
>>> details of how that is going to work.
>>
>> OK, I do wonder if we would be better off just skipping the VLAN
>> programming for VID = 0 and/or just defining a different
>> reserved/default VLAN ID for switches that have global VLAN filtering.
>>
>
> Oh, right, I remember. This is one of the switches with b53_default_pvid=0?
> So what were you saying... [0]

Yes that is correct, and we also have VLAN filtering being a global
property, which is why we cannot simply reject programming with VID = 0,
since that is the default, and default VID still requires a VLAN entry.

>
>>>>> Why should we bend the framework because sja1105 and dsa_8021q are
>>>>> special?
>
> [0]: https://lore.kernel.org/netdev/670c1d7f-4d2c-e9b4-3057-e87a66ad0d33@xxxxxxxxx/
>
> So having 0 as pvid will inevitably cause problems trying to do
> something meaningful with it on egress. Send it tagged, it'll mess
> with your untagged traffic, send it untagged, it'll mess with your
> stack-originated 802.1p-tagged traffic, as well as 802.1p-tagged
> traffic forwarded from other endpoints. Hence the reason why IEEE said
> "don't do that".
>
> Can't you do whatever workarounds with vid 0 and/or
> NETIF_F_HW_VLAN_CTAG_FILTER that are restricted to b53? The "flags"
> value of 0 for 802.1p tagged frames is fine under the assumption that
> vid 0 is never going to be a pvid, which it'd better not be. The
> bridge doesn't even let you run "bridge vlan add dev swp0 vid 0 pvid
> untagged", that should say something.

Yes, forcing the default VID to be unconditionally untagged (but not
PVID), scoped specifically to b53 works nicely, with, or without VLAN
filtering and I can confirm that no VID 0 tag is sent on the wire (as
expected).

Thanks!
--
Florian