Re: [PATCH net-next v2 0/7] net: lan966x: Add lag support
From: Vladimir Oltean
Date: Wed Jun 29 2022 - 08:26:42 EST
Hi Horatiu,
On Mon, Jun 27, 2022 at 10:13:23PM +0200, Horatiu Vultur wrote:
> Add lag support for lan966x.
> First 4 patches don't do any changes to the current behaviour, they
> just prepare for lag support. While the rest is to add the lag support.
>
> v1->v2:
> - fix the LAG PGIDs when ports go down, in this way is not
> needed anymore the last patch of the series.
>
> Horatiu Vultur (7):
> net: lan966x: Add reqisters used to configure lag interfaces
> net: lan966x: Split lan966x_fdb_event_work
> net: lan966x: Expose lan966x_switchdev_nb and
> lan966x_switchdev_blocking_nb
> net: lan966x: Extend lan966x_foreign_bridging_check
> net: lan966x: Add lag support for lan966x.
> net: lan966x: Extend FDB to support also lag
> net: lan966x: Extend MAC to support also lag interfaces.
>
> .../net/ethernet/microchip/lan966x/Makefile | 2 +-
> .../ethernet/microchip/lan966x/lan966x_fdb.c | 153 ++++++---
> .../ethernet/microchip/lan966x/lan966x_lag.c | 322 ++++++++++++++++++
> .../ethernet/microchip/lan966x/lan966x_mac.c | 66 +++-
> .../ethernet/microchip/lan966x/lan966x_main.h | 41 +++
> .../ethernet/microchip/lan966x/lan966x_regs.h | 45 +++
> .../microchip/lan966x/lan966x_switchdev.c | 115 +++++--
> 7 files changed, 654 insertions(+), 90 deletions(-)
> create mode 100644 drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
>
> --
> 2.33.0
>
I've downloaded and applied your patches and I have some general feedback.
Some of it relates to changes which were not made and hence I couldn't
have commented on the patches themselves, so I'm posting it here.
1. switchdev_bridge_port_offload() returns an error code if object
replay failed, or if it couldn't get the port parent id, or if the user
tries to join a lan966x port and a port belonging to another switchdev
driver to the same LAG. It would be good to propagate this error and not
ignore it.
Side note: maybe this could help to eliminate the extra logic you need
to add to lan966x_foreign_bridging_check().
2. lan966x_foreign_dev_check() seems wrong/misunderstood. Currently it
reports that a LAG upper is a foreign interface (unoffloaded). In turn,
this makes switchdev_lower_dev_find() not find any lan966x interface
beneath a LAG, and hence, __switchdev_handle_fdb_event_to_device() would
not recurse to the lan966x "dev" below a LAG when the "orig_dev" of an
FDB event is the bridge itself. Otherwise said, if you have no direct
lan966x port under a bridge, but just bridge -> LAG -> lan966x, you will
miss all local (host-filtered) FDB event notifications that you should
otherwise learn towards the CPU.
3. The implementation of lan966x_lag_mac_add_entry(), with that first
call to lan966x_mac_del_entry(), seems a hack. Why do you need to do
that?
4. The handling of lan966x->mac_lock seems wrong in general, not just
particular to this patch set. In particular, it appears to protect too
little in lan966x_mac_add_entry(), i.e. just the list_add_tail.
This makes it possible for lan966x_mac_lookup and lan966x_mac_learn to
be concurrent with lan966x_mac_del_entry(). In turn, this appears bad
first and foremost for the hardware access interface, since the MAC
table access is indirect, and if you allow multiple threads to
concurrently call lan966x_mac_select(), change the command in
ANA_MACACCESS, and poll for command completion, things will go sideways
very quickly (one command will inadvertently poll for the completion of
another, which inadvertently operates on the row/column selected by yet
a third command, all that due to improper serialization).
5. There is a race between lan966x_fdb_lag_event_work() calling
lan966x_lag_first_port(), and lan966x_lag_port_leave() changing
port->bond = NULL. Specifically, when a lan966x port leaves a LAG, there
might still be deferred FDB events (add or del) which are still pending.
There exists a dead time during which you will ignore these, because you
think that the first lan966x LAG port isn't the first lan966x LAG port,
which will lead to a desynchronization between the bridge FDB and the
hardware FDB.
In DSA we solved this by flushing lan966x->fdb_work inside
lan966x_port_prechangeupper() on leave. This waits for the pending
events to finish, and the bridge will not emit further events.
It's important to do this in prechangeupper() rather than in
changeupper() because switchdev_handle_fdb_event_to_device() needs the
upper/lower relationship to still exist to function properly, and in
changeupper() it has already been destroyed.
Side note: if you flush lan966x->fdb_work, then you have an upper bound
for how long can lan966x_fdb_event_work be deferred. Specifically, you
can remove the dev_hold() and dev_put() calls, since it surely can't be
deferred until after the netdev is unregistered. The bounding event is
much quicker - the lan966x port leaves the LAG.
6. You are missing LAG FDB migration logic in lan966x_lag_port_join().
Specifically, you assume that the lan966x_lag_first_port() will never
change, probably because you just make the switch ports join the LAG in
the order 1, 2, 3. But they can also join in the order 3, 2, 1.