Re: [PATCH net-next v4 10/13] dpaa2-switch: offload FDBs added on an upper bond device

From: Ioana Ciornei

Date: Tue Jun 30 2026 - 10:31:52 EST


On Mon, Jun 29, 2026 at 02:23:06PM +0300, Ioana Ciornei wrote:
> This patch adds support for offloading FDB entries added on upper bond
> devices.
>
> First of all, the call to switchdev_bridge_port_offload() is updated so
> that the notifier blocks needed for FDB events replay are available to
> the bridge core.
>
> Using switchdev_handle_*() helpers is also necessary because each FDB
> event needs to be fanned out to any DPAA2 switch lower device. This
> triggers another change in the return type used by the
> dpaa2_switch_port_fdb_event() - from notifier types to regular errno
> types.
>
> Handling of the SWITCHDEV_FDB_ADD_TO_DEVICE/SWITCHDEV_FDB_DEL_TO_DEVICE
> events is updated so that the newly dpaa2_switch_lag_fdb_add() /
> dpaa2_switch_lag_fdb_del() functions are called anytime a port is under
> a bond device. This will allow us to manage refcounting on FDB entries
> which are added on the upper bond devices.
>
> The DPAA2 switch uses shared-VLAN learning which means that the vid
> parameter is not used when adding an FDB entry to HW. The current
> behavior when dealing with FDB entries with the same MAC address but
> different VLANs is to add the entry to HW every time while removal will
> get done on the first 'bridge fdb del' command issued by the user.
>
> The same behavior is kept also for FDBs added on bond devices by keeping
> the refcount on the {vid, addr} pair while the HW operation disregards
> entirely the vid parameter.
>
> Signed-off-by: Ioana Ciornei <ioana.ciornei@xxxxxxx>
> ---
> Changes in v4:
> - Migrate FDBs in case the primary interface of a LAG changes.
> - Use lag->primary instead of determining each time the primary
> interface of a LAG device
>
> Changes in v3:
> - Update dpaa2_switch_foreign_dev_check() so that we check if there is
> any port in the same switch as dev which offloads foreign_dev in case
> this is a bridge port.
> - Add mutex_destroy on the per LAG fdb_lock
> - Make sure that all FDB events were processed on the workqueue on the
> .remove() path.
> - Delete the refcounted entry in dpaa2_switch_lag_fdb_del() as soon as
> possible, even if the HW deletion would fail
> - Access the port_priv->lag field only through the proper rcu accessors.
>
> Changes in v2:
> - Update dpaa2_switch_foreign_dev_check() so that we check if between
> the switch port and the foreign net_device is an offloaded path. Before
> this change we also checked if the foreign_dev was offloaded or not by
> the switch port.
> - Update the switchdev_bridge_port_unoffload() by passing it the proper
> context and the notifier blocks.
> - Add dev_hold() and dev_put() calls for orig_dev
> ---
> .../ethernet/freescale/dpaa2/dpaa2-switch.c | 227 ++++++++++++++++--
> .../ethernet/freescale/dpaa2/dpaa2-switch.h | 24 ++
> 2 files changed, 225 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> index 949a7241a00f..307b3b7a1bfb 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> @@ -25,6 +25,9 @@
>
> #define DEFAULT_VLAN_ID 1
>
> +static struct notifier_block dpaa2_switch_port_switchdev_nb;
> +static struct notifier_block dpaa2_switch_port_switchdev_blocking_nb;
> +
> static u16 dpaa2_switch_port_get_fdb_id(struct ethsw_port_priv *port_priv)
> {
> return port_priv->fdb->fdb_id;
> @@ -585,6 +588,81 @@ static int dpaa2_switch_port_fdb_del(struct ethsw_port_priv *port_priv,
> return dpaa2_switch_port_fdb_del_mc(port_priv, addr);
> }
>
> +static struct dpaa2_mac_addr *
> +dpaa2_switch_mac_addr_find(struct list_head *addr_list,
> + const unsigned char *addr, u16 vid)
> +{
> + struct dpaa2_mac_addr *a;
> +
> + list_for_each_entry(a, addr_list, list)
> + if (ether_addr_equal(a->addr, addr) && a->vid == vid)
> + return a;
> +
> + return NULL;
> +}
> +
> +static int dpaa2_switch_lag_fdb_add(struct dpaa2_switch_lag *lag,
> + const unsigned char *addr, u16 vid)
> +{
> + struct ethsw_port_priv *port_priv = lag->primary;
> + struct dpaa2_mac_addr *a;
> + int err = 0;
> +
> + mutex_lock(&lag->fdb_lock);
> +
> + a = dpaa2_switch_mac_addr_find(&lag->fdbs, addr, vid);
> + if (a) {
> + refcount_inc(&a->refcount);
> + goto out;
> + }
> +
> + a = kzalloc(sizeof(*a), GFP_KERNEL);
> + if (!a) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + err = dpaa2_switch_port_fdb_add(port_priv, addr);
> + if (err) {
> + kfree(a);
> + goto out;
> + }
> +
> + ether_addr_copy(a->addr, addr);
> + a->vid = vid;
> + refcount_set(&a->refcount, 1);
> + list_add_tail(&a->list, &lag->fdbs);
> +
> +out:
> + mutex_unlock(&lag->fdb_lock);
> +
> + return err;
> +}
> +
> +static void dpaa2_switch_lag_fdb_del(struct dpaa2_switch_lag *lag,
> + const unsigned char *addr, u16 vid)
> +{
> + struct ethsw_port_priv *port_priv = lag->primary;
> + struct dpaa2_mac_addr *a;
> +
> + mutex_lock(&lag->fdb_lock);
> +
> + a = dpaa2_switch_mac_addr_find(&lag->fdbs, addr, vid);
> + if (!a)
> + goto out;
> +
> + if (!refcount_dec_and_test(&a->refcount))
> + goto out;
> +
> + list_del(&a->list);
> + kfree(a);
> +
> + dpaa2_switch_port_fdb_del(port_priv, addr);
> +
> +out:
> + mutex_unlock(&lag->fdb_lock);
> +}
> +
> static void dpaa2_switch_port_get_stats(struct net_device *netdev,
> struct rtnl_link_stats64 *stats)
> {
> @@ -1533,6 +1611,33 @@ bool dpaa2_switch_port_dev_check(const struct net_device *netdev)
> return netdev->netdev_ops == &dpaa2_switch_port_ops;
> }
>
> +static bool dpaa2_switch_foreign_dev_check(const struct net_device *dev,
> + const struct net_device *foreign_dev)
> +{
> + struct ethsw_port_priv *port_priv = netdev_priv(dev);
> + struct ethsw_core *ethsw = port_priv->ethsw_data;
> + struct ethsw_port_priv *other_port;
> + int i;
> +
> + if (netif_is_bridge_master(foreign_dev))
> + if (port_priv->fdb->bridge_dev == foreign_dev)
> + return false;
> +
> + if (netif_is_bridge_port(foreign_dev)) {
> + for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
> + other_port = ethsw->ports[i];
> +
> + if (!other_port)
> + continue;
> + if (dpaa2_switch_port_offloads_bridge_port(other_port,
> + foreign_dev))
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
> static int dpaa2_switch_port_connect_mac(struct ethsw_port_priv *port_priv)
> {
> struct fsl_mc_device *dpsw_port_dev, *dpmac_dev;
> @@ -2100,8 +2205,10 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
> goto err_egress_flood;
>
> brport_dev = dpaa2_switch_port_to_bridge_port(port_priv);
> - err = switchdev_bridge_port_offload(brport_dev, netdev, NULL,
> - NULL, NULL, false, extack);
> + err = switchdev_bridge_port_offload(brport_dev, netdev, port_priv,
> + &dpaa2_switch_port_switchdev_nb,
> + &dpaa2_switch_port_switchdev_blocking_nb,
> + false, extack);
> if (err)
> goto err_switchdev_offload;
>
> @@ -2143,7 +2250,9 @@ static void dpaa2_switch_port_pre_bridge_leave(struct net_device *netdev)
> if (!brport_dev)
> return;
>
> - switchdev_bridge_port_unoffload(brport_dev, NULL, NULL, NULL);
> + switchdev_bridge_port_unoffload(brport_dev, port_priv,
> + &dpaa2_switch_port_switchdev_nb,
> + &dpaa2_switch_port_switchdev_blocking_nb);
>
> /* Make sure that any FDB add/del operations are completed before the
> * bridge layout changes
> @@ -2425,9 +2534,10 @@ static int dpaa2_switch_set_lag_cfg(struct net_device *bond_dev, u8 lag_id,
> struct ethsw_core *ethsw)
> {
> struct dpaa2_switch_lag *lag = &ethsw->lags[lag_id - 1];
> - struct ethsw_port_priv *primary, *new_primary = NULL;
> - struct ethsw_port_priv *port_priv = NULL;
> + struct ethsw_port_priv *primary, *port_priv;
> + struct ethsw_port_priv *new_primary = NULL;
> struct dpsw_lag_cfg cfg = {0};
> + struct dpaa2_mac_addr *a;
> u8 num_ifs = 0;
> int err, i;
>
> @@ -2454,7 +2564,6 @@ static int dpaa2_switch_set_lag_cfg(struct net_device *bond_dev, u8 lag_id,
> /* Build the interface list, always placing the primary first */
> if (primary)
> cfg.if_id[num_ifs++] = primary->idx;
> -
> for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
> port_priv = ethsw->ports[i];
> if (port_priv == primary)
> @@ -2477,11 +2586,32 @@ static int dpaa2_switch_set_lag_cfg(struct net_device *bond_dev, u8 lag_id,
> lag->in_use = false;
> }
>
> + /* When the primary changes, migrate the FDB entries from the old
> + * primary to the new one: remove them before reconfiguring the LAG in
> + * hardware and re-add them on the new primary afterwards. We do not
> + * touch any refcounting since the intention is to change the HW entry,
> + * not the parallel software tracking.
> + */
> + if (new_primary) {
> + mutex_lock(&lag->fdb_lock);
> + list_for_each_entry(a, &lag->fdbs, list)
> + dpaa2_switch_port_fdb_del(lag->primary, a->addr);
> + mutex_unlock(&lag->fdb_lock);
> + }
> +
> err = dpsw_lag_set(ethsw->mc_io, 0, ethsw->dpsw_handle, &cfg);
> if (err)
> return err;
>

sashiko-nipa notes:

[High, Medium] When the last port leaves the bond, the block
above sets

if (!num_ifs) {
synchronize_net();

lag->bond_dev = NULL;
lag->primary = NULL;
lag->in_use = false;
}

Can a queued workqueue item still race with this teardown?
Looking at dpaa2_switch_event_work():

rcu_read_lock();
lag = rcu_dereference(port_priv->lag);
rcu_read_unlock();

switch (switchdev_work->event) {
case SWITCHDEV_FDB_ADD_TO_DEVICE:
if (lag)
err = dpaa2_switch_lag_fdb_add(lag, fdb_info->addr, ...);

The RCU read section ends before the lag is used, so the
synchronize_net() in set_lag_cfg returns immediately without waiting
for the work. dpaa2_switch_lag_fdb_add() then reads lag->primary while
holding only fdb_lock, which the writer does not take. If lag->primary
has been set to NULL by the writer, port_priv = lag->primary; ... in
dpaa2_switch_lag_fdb_add() will dereference NULL through
dpaa2_switch_port_fdb_add() -> dpaa2_switch_port_fdb_add_uc() reading
port_priv->idx and port_priv->ethsw_data.

The in-file comment claims the lag pointer staying alive is enough, but
lag->primary is a separately mutable field with no shared lock between
this reader and the writer. Should lag->primary itself be protected by
fdb_lock (or by the rtnl/RCU pattern actually waited on), or should
the bond-leave path flush_workqueue() before clearing primary?

A related window exists during the primary migration below: between
unlock of fdb_lock after the add-loop and the lag->primary = new_primary
store, a concurrent work item can still observe the OLD primary value
and install entries on it while the HW LAG is being reconfigured. Is
that intentional?

Not correct. As stated in the commit message for 2/13, any concurrency
between on-going work items and changeupper events is resolved by
flushing the workqueue from the prechangeupper event.

Ioana