Re: [PATCH net] net: dsa: mv88e6xxx: Verify after ATU Load ops

From: Joseph Huang
Date: Wed Mar 05 2025 - 12:45:07 EST


On 3/5/2025 10:14 AM, Andrew Lunn wrote:
On Tue, Mar 04, 2025 at 06:53:51PM -0500, Joseph Huang wrote:
ATU Load operations could fail silently if there's not enough space
on the device to hold the new entry.

Do a Read-After-Write verification after each fdb/mdb add operation
to make sure that the operation was really successful, and return
-ENOSPC otherwise.

Please could you add a description of what the user sees when the ATU
is full. What makes this a bug which needs fixing? I would of thought
at least for unicast addresses, the switch has no entry for the
destination, so sends the packet to the CPU. The CPU will then
software bridge it out the correct port. Reporting ENOSPC will not
change that.

Hi Andrew,

What the user will see when the ATU table is full depends on the unknown flood setting. If a user has unknown multicast flood disabled, what the user will see is that multicast packets are dropped when the ATU table is full. In other words, IGMP snooping is broken when the ATU Load operation fails silently.

Even if the packet is kicked up to the CPU and the CPU intends to forward the packet out via the software bridge, the forwarding attempt is going to be blocked due to the 'offload_fwd_mark' flag in nbp_switchdev_allowed_egress(). Even if that somehow worked, we will not be fully utilizing the hardware's switching capability and will be relying on the CPU to do the forwarding, which will likely result in lower throughput.

Reporting -ENOSPC will not change the fact that the ATU table is full, however it does give switchdev a chance to notify the user and then the user can take some further action accordingly. If nothing else, at least 'bridge monitor' will now report that the entries are not offloaded.

Some other DSA drivers are reporting -ENOSPC as well when the table is full (at least b53 and ocelot).


@@ -2845,7 +2866,8 @@ static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_db_load_purge(chip, port, addr, vid,
- MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC);
+ MV88E6XXX_G1_ATU_DATA_STATE_UC_STATIC,
+ true);
mv88e6xxx_reg_unlock(chip);
return err;

@@ -6613,7 +6635,8 @@ static int mv88e6xxx_port_mdb_add(struct dsa_switch *ds, int port,
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_db_load_purge(chip, port, mdb->addr, mdb->vid,
- MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC);
+ MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC,
+ true);
mv88e6xxx_reg_unlock(chip);

This change seems bigger than what it needs to be. Rather than modify
mv88e6xxx_port_db_load_purge(), why not perform the lookup just in
these two functions via a helper?

Andrew

I will make that change. Thanks for the review.

Thanks,
Joseph

---
pw-bot: cr