Re: [PATCH RFC net 2/2] net: dsa: propagate brentry flag changes
From: Vladimir Oltean
Date: Mon Apr 14 2025 - 08:49:59 EST
On Sat, Apr 12, 2025 at 02:24:28PM +0200, Jonas Gorski wrote:
> Currently any flag changes for brentry vlans are ignored, so the
> configured cpu port vlan will get stuck at whatever the original flags
> were.
>
> E.g.
>
> $ bridge vlan add dev swbridge vid 10 self pvid untagged
> $ bridge vlan add dev swbridge vid 10 self
>
> Would cause the vlan to get "stuck" at pvid untagged in the hardware,
> despite now being configured as tagged on the bridge.
>
> Fix this by passing on changed vlans to drivers, but do not increase the
> refcount for updates.
>
> Since we should never get an update for a non-existing VLAN, add a
> WARN_ON() in case it happens.
>
> Fixes: 134ef2388e7f ("net: dsa: add explicit support for host bridge VLANs")
> Signed-off-by: Jonas Gorski <jonas.gorski@xxxxxxxxx>
> ---
I think it's important to realize that the meaning of the "flags" of
VLANs offloaded to the CPU port is not completely defined.
"egress-untagged" from the perspective of the hardware CPU port is the
opposite direction compared to "egress-untagged" from the perspective of
the bridge device (one is Linux RX, the other is Linux TX).
Additionally, we install in DSA as host VLANs also those bridge port VLANs
which were configured by the user on foreign interfaces. It's not exactly
clear how to reconcile the "flags" of a VLAN installed on the bridge
itself with the "flags" of a VLAN installed on a foreign bridge port.
Example:
ip link add br0 type bridge vlan_filtering 1 vlan_default_pvid 0
ip link set veth0 master br0 # foreign interface, unrelated to DSA
ip link set swp0 master br0 # DSA interface
bridge vlan add dev br0 vid 1 self pvid untagged # leads to an "dsa_vlan_add_hw: cpu port N vid 1 untagged" trace event
bridge vlan add dev veth0 vid 1 # still leads to an "dsa_vlan_add_bump: cpu port N vid 1 refcount 2" trace event after your change
Depending on your expectations, you might think that host VID 1 would
also need to become egress-tagged in this case, although from the
bridge's perspective, it hasn't "changed", because it is a VLAN from a
different VLAN group (port veth0 vs bridge br0).
The reverse is true as well. Because the user can toggle the "pvid" flag
of the bridge VLAN, that will make the switchdev object be notified with
changed=true. But since DSA clears BRIDGE_VLAN_INFO_PVID, the host VLAN,
as programmed to hardware, would be identical, yet we reprogram it anyway.
Both would seem to indicate that "changed" from the bridge perspective
is not what matters for calling the driver, but a different "changed"
flag, calculated by DSA from its own perspective.
I was a bit reluctant to add such complexity in dsa_port_do_vlan_add(),
considering that many drivers treat the VLANs on the CPU port as
always-tagged towards software (not b53 though, except for
b53_vlan_port_needs_forced_tagged() which is only for DSA_TAG_PROTO_NONE).
In fact, what is not entirely clear to me is what happens if they _don't_
treat the CPU port in a special way. Because software needs to know in
which VLAN did the hardware begin to process a packet: if the software
bridge needs to continue the processing of that packet, it needs to do
so _in the same VLAN_. If the accelerator sends packets as VLAN-untagged
to software, that information is lost and VLAN hopping might take place.
So I was hoping that nobody would notice that the change of flags on
host VLANs isn't propagated to drivers, because none of the flags should
be of particular relevance in the first place.
I would like to understand better, in terms of user-visible impact, what
is the problem that you see?