Re: [PATCH net v3 1/1] net: dsa: microchip: implement multi-bridge support

From: Vladimir Oltean
Date: Fri Nov 26 2021 - 15:44:57 EST


On Fri, Nov 26, 2021 at 11:43:36AM -0800, Jakub Kicinski wrote:
> On Fri, 26 Nov 2021 13:39:26 +0100 Oleksij Rempel wrote:
> > Current driver version is able to handle only one bridge at time.
> > Configuring two bridges on two different ports would end up shorting this
> > bridges by HW. To reproduce it:
> >
> > ip l a name br0 type bridge
> > ip l a name br1 type bridge
> > ip l s dev br0 up
> > ip l s dev br1 up
> > ip l s lan1 master br0
> > ip l s dev lan1 up
> > ip l s lan2 master br1
> > ip l s dev lan2 up
> >
> > Ping on lan1 and get response on lan2, which should not happen.
> >
> > This happened, because current driver version is storing one global "Port VLAN
> > Membership" and applying it to all ports which are members of any
> > bridge.
> > To solve this issue, we need to handle each port separately.
> >
> > This patch is dropping the global port member storage and calculating
> > membership dynamically depending on STP state and bridge participation.
> >
> > Note: STP support was broken before this patch and should be fixed
> > separately.
> >
> > Fixes: c2e866911e25 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
>
> Suspicious, this sounds like a code reshuffling commit.

This intrigued me, so I looked it up. If you look at the git diff of
that commit, you'll see that the "member" variable of struct ksz_port
only appears in the green portion of the delta, not even once in the red
portion. In fact, struct ksz_port was _introduced_ by that commit!

> Where was the bad code introduced? The fixes tag should point at the
> earliest point in the git history where the problem exists.

What bad code? As far as I can tell, prior to that commit, there was no
restriction of forwarding domain applied to this switch. Heck,
->port_bridge_join() was added in that commit! After that commit,
restricting the forwarding domain was done poorly. No wonder, since
reviewers probably did not notice what was going on.

We are talking here about a very poorly written and subpar driver.
I wouldn't be too mad at Oleksij for not doing a cleaner job for this
commit, it's pretty much "delete the bogus stuff and rewrite it the way
it should be", which I think is fine at this stage, this driver needs that.
His code appears fine to my non-expert fine, I've added very similar
logic to ocelot which has the same constraints of juggling with the
forwarding domain based on STP states.

Also, in case you're wondering why I'm responding in his defense, it is
for very selfish reasons, of course :) I'd like to continue working on
this patch series:
https://patchwork.kernel.org/project/netdevbpf/cover/20211026162625.1385035-1-vladimir.oltean@xxxxxxx/
which is invasive because it touches every driver's bridging callbacks.
So the sooner that in-flight patches related to bridging can go in, the
sooner I can resend a new version of my API rework.