[PATCH net-next] net: dsa: mv88e6xxx: avoid some redundant vtu load/purge operations

From: Rasmus Villemoes
Date: Mon Jul 22 2019 - 19:37:33 EST


We have an ERPS (Ethernet Ring Protection Switching) setup involving
mv88e6250 switches which we're in the process of switching to a BSP
based on the mainline driver. Breaking any link in the ring works as
expected, with the ring reconfiguring itself quickly and traffic
continuing with almost no noticable drops. However, when plugging back
the cable, we see 5+ second stalls.

This has been tracked down to the userspace application in charge of
the protocol missing a few CCM messages on the good link (the one that
was not unplugged), causing it to broadcast a "signal fail". That
message eventually reaches its link partner, which responds by
blocking the port. Meanwhile, the first node has continued to block
the port with the just plugged-in cable, breaking the network. And the
reason for those missing CCM messages has in turn been tracked down to
the VTU apparently being too busy servicing load/purge operations that
the normal lookups are delayed.

Initial state, the link between C and D is blocked in software.

_____________________
/ \
| |
A ----- B ----- C *---- D

Unplug the cable between C and D.

_____________________
/ \
| |
A ----- B ----- C * * D

Reestablish the link between C and D.
_____________________
/ \
| |
A ----- B ----- C *---- D

Somehow, enough VTU/ATU operations happen inside C that prevents
the application from receving the CCM messages from B in a timely
manner, so a Signal Fail message is sent by C. When B receives
that, it responds by blocking its port.

_____________________
/ \
| |
A ----- B *---* C *---- D

Very shortly after this, the signal fail condition clears on the
BC link (some CCM messages finally make it through), so C
unblocks the port. However, a guard timer inside B prevents it
from removing the blocking before 5 seconds have elapsed.

It is not unlikely that our userspace ERPS implementation could be
smarter and/or is simply buggy. However, this patch fixes the symptoms
we see, and is a small optimization that should not break anything
(knock wood). The idea is simply to avoid doing an VTU load of an
entry identical to the one already present. To do that, we need to
know whether mv88e6xxx_vtu_get() actually found an existing entry, or
has just prepared a struct mv88e6xxx_vtu_entry for us to load. To that
end, let vlan->valid be an output parameter. The other two callers of
mv88e6xxx_vtu_get() are not affected by this patch since they pass
new=false.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes@xxxxxxxxx>
---
drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6b17cd961d06..2e500428670c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1423,7 +1423,6 @@ static int mv88e6xxx_vtu_get(struct mv88e6xxx_chip *chip, u16 vid,

/* Initialize a fresh VLAN entry */
memset(entry, 0, sizeof(*entry));
- entry->valid = true;
entry->vid = vid;

/* Exclude all ports */
@@ -1618,6 +1617,9 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
if (err)
return err;

+ if (vlan.valid && vlan.member[port] == member)
+ return 0;
+ vlan.valid = true;
vlan.member[port] = member;

err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
--
2.20.1