Re: [PATCH RFC net 2/2] net: dsa: propagate brentry flag changes
From: Vladimir Oltean
Date: Mon Apr 14 2025 - 08:53:06 EST
+Jonas, whom I've mistakenly removed from To: :-/
On Mon, Apr 14, 2025 at 03:49:30PM +0300, Vladimir Oltean wrote:
> On Sat, Apr 12, 2025 at 02:24:28PM +0200, Jonas Gorski wrote:
> > Currently any flag changes for brentry vlans are ignored, so the
> > configured cpu port vlan will get stuck at whatever the original flags
> > were.
> >
> > E.g.
> >
> > $ bridge vlan add dev swbridge vid 10 self pvid untagged
> > $ bridge vlan add dev swbridge vid 10 self
> >
> > Would cause the vlan to get "stuck" at pvid untagged in the hardware,
> > despite now being configured as tagged on the bridge.
> >
> > Fix this by passing on changed vlans to drivers, but do not increase the
> > refcount for updates.
> >
> > Since we should never get an update for a non-existing VLAN, add a
> > WARN_ON() in case it happens.
> >
> > Fixes: 134ef2388e7f ("net: dsa: add explicit support for host bridge VLANs")
> > Signed-off-by: Jonas Gorski <jonas.gorski@xxxxxxxxx>
> > ---
>
> I think it's important to realize that the meaning of the "flags" of
> VLANs offloaded to the CPU port is not completely defined.
> "egress-untagged" from the perspective of the hardware CPU port is the
> opposite direction compared to "egress-untagged" from the perspective of
> the bridge device (one is Linux RX, the other is Linux TX).
>
> Additionally, we install in DSA as host VLANs also those bridge port VLANs
> which were configured by the user on foreign interfaces. It's not exactly
> clear how to reconcile the "flags" of a VLAN installed on the bridge
> itself with the "flags" of a VLAN installed on a foreign bridge port.
>
> Example:
> ip link add br0 type bridge vlan_filtering 1 vlan_default_pvid 0
> ip link set veth0 master br0 # foreign interface, unrelated to DSA
> ip link set swp0 master br0 # DSA interface
> bridge vlan add dev br0 vid 1 self pvid untagged # leads to an "dsa_vlan_add_hw: cpu port N vid 1 untagged" trace event
> bridge vlan add dev veth0 vid 1 # still leads to an "dsa_vlan_add_bump: cpu port N vid 1 refcount 2" trace event after your change
>
> Depending on your expectations, you might think that host VID 1 would
> also need to become egress-tagged in this case, although from the
> bridge's perspective, it hasn't "changed", because it is a VLAN from a
> different VLAN group (port veth0 vs bridge br0).
>
> The reverse is true as well. Because the user can toggle the "pvid" flag
> of the bridge VLAN, that will make the switchdev object be notified with
> changed=true. But since DSA clears BRIDGE_VLAN_INFO_PVID, the host VLAN,
> as programmed to hardware, would be identical, yet we reprogram it anyway.
>
> Both would seem to indicate that "changed" from the bridge perspective
> is not what matters for calling the driver, but a different "changed"
> flag, calculated by DSA from its own perspective.
>
> I was a bit reluctant to add such complexity in dsa_port_do_vlan_add(),
> considering that many drivers treat the VLANs on the CPU port as
> always-tagged towards software (not b53 though, except for
> b53_vlan_port_needs_forced_tagged() which is only for DSA_TAG_PROTO_NONE).
> In fact, what is not entirely clear to me is what happens if they _don't_
> treat the CPU port in a special way. Because software needs to know in
> which VLAN did the hardware begin to process a packet: if the software
> bridge needs to continue the processing of that packet, it needs to do
> so _in the same VLAN_. If the accelerator sends packets as VLAN-untagged
> to software, that information is lost and VLAN hopping might take place.
> So I was hoping that nobody would notice that the change of flags on
> host VLANs isn't propagated to drivers, because none of the flags should
> be of particular relevance in the first place.
>
> I would like to understand better, in terms of user-visible impact, what
> is the problem that you see?