Re: [PATCH net v2] net: dsa: tag_brcm: legacy: fix untagged rx on unbridged ports for bcm63xx

From: Jonas Gorski

Date: Tue Oct 28 2025 - 06:15:36 EST


On Mon, Oct 27, 2025 at 10:15 PM Vladimir Oltean <olteanv@xxxxxxxxx> wrote:
>
> On Mon, Oct 27, 2025 at 08:46:21PM +0100, Jonas Gorski wrote:
> > The internal switch on BCM63XX SoCs will unconditionally add 802.1Q VLAN
> > tags on egress to CPU when 802.1Q mode is enabled. We do this
> > unconditionally since commit ed409f3bbaa5 ("net: dsa: b53: Configure
> > VLANs while not filtering").
> >
> > This is fine for VLAN aware bridges, but for standalone ports and vlan
> > unaware bridges this means all packets are tagged with the default VID,
> > which is 0.
> >
> > While the kernel will treat that like untagged, this can break userspace
> > applications processing raw packets, expecting untagged traffic, like
> > STP daemons.
> >
> > This also breaks several bridge tests, where the tcpdump output then
> > does not match the expected output anymore.
> >
> > Since 0 isn't a valid VID, just strip out the VLAN tag if we encounter
> > it, unless the priority field is set, since that would be a valid tag
> > again.
> >
> > Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags")
> > Signed-off-by: Jonas Gorski <jonas.gorski@xxxxxxxxx>
> > ---
>
> Reviewed-by: Vladimir Oltean <olteanv@xxxxxxxxx>
>
> Sorry for dropping the ball on v1. To reply to your reply there,
> https://lore.kernel.org/netdev/CAOiHx=mNnMJTnAN35D6=LPYVTQB+oEmedwqrkA6VRLRVi13Kjw@xxxxxxxxxxxxxx/
> I hadn't realized that b53 sets ds->untag_bridge_pvid conditionally,
> which makes any consolidation work in stable trees very complicated
> (although still desirable in net-next).

It's for some more obscure cases where we cannot use the Broadcom tag,
like a switch where the CPU port isn't a management port but a normal
port. I am not sure this really exists, but maybe Florian knows if
there are any (still used) boards where this applies.

If not, I am more than happy to reject this path as -EINVAL instead of
the current TAG_NONE with untag_bridge_pvid = true.

>
> | And to sidetrack the discussion a bit, I wonder if calling
> | __vlan_hwaccel_clear_tag() in
> | dsa_software_untag_vlan_(un)aware_bridge() without checking the
> | vlan_tci field strips 802.1p information from packets that have it. I
> | fail to find if this is already parsed and stored somewhere at a first
> | glance.
>
> 802.1p information should be parsed in vlan_do_receive() if vlan_find_dev()
> found something:
>
> skb->priority = vlan_get_ingress_priority(vlan_dev, skb->vlan_tci);
>
> This logic is invoked straight from __netif_receive_skb_core(), and
> relative to dsa_switch_rcv(), it hasn't run yet.

I see. So I presume it runs again after dsa_switch_rcv() has run? Or
rather __netif_receive_skb_core() jumps to to another_round or so?

> Apart from that and user-configured netfilter/tc rules, I don't think
> there's anything else in the kernel that processes the vlan_tci on
> ingress (of course that isn't to say anything about user space).
>
> With regard to dsa_software_untag_vlan_unaware_bridge(), which I'd like
> to see removed rather than reworked, it does force you to use a br0.0
> VLAN upper to not strip the 802.1p info, which is OK.
>
> With regard to dsa_software_untag_vlan_aware_bridge(), it only strips
> VID values which are != 0 (because the bridge PVID iself is != 0 - if it
> was zero that would be another bug, the port should have dropped the
> packet with a VLAN-aware bridge).

AFAICT if a port on a vlan aware bridge has no PVID configured and
receives a VID 0 tagged packet, then
dsa_software_untag_vlan_aware_bridge() will strip it:

Since skb-proto is 802.1Q, we call skb_vlan_untag(), which will set
skb->vlan_proto to 802.1q and skb->vlan_tci to 0

skb->vlan_all is now != 0, so skb_vlan_tag_present() returns true.

skb_vlan_tag_get_id() returns 0, since skb->vlan_tci is 0.

So on a vlan_aware bridge with untag_vlan_aware_bridge_pvid enabled we
now call dsa_software_untag_vlan_aware_bridge() with vid = 0.

In there br_vlan_get_pvid_rcu() sets pvid to 0 if no PVID is
configured since br_get_pvid() returns 0 in that case, so the
condition (vid == pvid) is fulfilled, so we call
__vlan_hwaccel_clear_tag(), stripping the tag.

Obviously for any other configurations (e.g. PVID > 0 and VID = 0) we
don't do anything.

For BCM63xx, the issue with VID 0 tagged traffic on egress on the CPU
port is only if there is no PVID configured on a port - once there is
a PVID, the ingress untagged traffic will be tagged with the
appropriate VLAN ID on egress on the CPU port, and everything is fine.
So in a sense it happens to work by accident for ports without a PVID
and untagged traffic on them.

But vlan aware bridges are the only cases where PVIDs do get
configured/are active, in all other configurations PVID is programmed
as 0 on ports.

Best regards,
Jonas