Re: [PATCH net-next v4 10/13] dpaa2-switch: offload FDBs added on an upper bond device
From: Ioana Ciornei
Date: Tue Jun 30 2026 - 10:46:35 EST
On Mon, Jun 29, 2026 at 02:23:06PM +0300, Ioana Ciornei wrote:
> This patch adds support for offloading FDB entries added on upper bond
> devices.
>
> First of all, the call to switchdev_bridge_port_offload() is updated so
> that the notifier blocks needed for FDB events replay are available to
> the bridge core.
>
> Using switchdev_handle_*() helpers is also necessary because each FDB
> event needs to be fanned out to any DPAA2 switch lower device. This
> triggers another change in the return type used by the
> dpaa2_switch_port_fdb_event() - from notifier types to regular errno
> types.
>
> Handling of the SWITCHDEV_FDB_ADD_TO_DEVICE/SWITCHDEV_FDB_DEL_TO_DEVICE
> events is updated so that the newly dpaa2_switch_lag_fdb_add() /
> dpaa2_switch_lag_fdb_del() functions are called anytime a port is under
> a bond device. This will allow us to manage refcounting on FDB entries
> which are added on the upper bond devices.
>
> The DPAA2 switch uses shared-VLAN learning which means that the vid
> parameter is not used when adding an FDB entry to HW. The current
> behavior when dealing with FDB entries with the same MAC address but
> different VLANs is to add the entry to HW every time while removal will
> get done on the first 'bridge fdb del' command issued by the user.
>
> The same behavior is kept also for FDBs added on bond devices by keeping
> the refcount on the {vid, addr} pair while the HW operation disregards
> entirely the vid parameter.
>
> Signed-off-by: Ioana Ciornei <ioana.ciornei@xxxxxxx>
> ---
> Changes in v4:
> - Migrate FDBs in case the primary interface of a LAG changes.
> - Use lag->primary instead of determining each time the primary
> interface of a LAG device
>
> Changes in v3:
> - Update dpaa2_switch_foreign_dev_check() so that we check if there is
> any port in the same switch as dev which offloads foreign_dev in case
> this is a bridge port.
> - Add mutex_destroy on the per LAG fdb_lock
> - Make sure that all FDB events were processed on the workqueue on the
> .remove() path.
> - Delete the refcounted entry in dpaa2_switch_lag_fdb_del() as soon as
> possible, even if the HW deletion would fail
> - Access the port_priv->lag field only through the proper rcu accessors.
>
> Changes in v2:
> - Update dpaa2_switch_foreign_dev_check() so that we check if between
> the switch port and the foreign net_device is an offloaded path. Before
> this change we also checked if the foreign_dev was offloaded or not by
> the switch port.
> - Update the switchdev_bridge_port_unoffload() by passing it the proper
> context and the notifier blocks.
> - Add dev_hold() and dev_put() calls for orig_dev
> ---
(...)
> @@ -2454,7 +2564,6 @@ static int dpaa2_switch_set_lag_cfg(struct net_device *bond_dev, u8 lag_id,
> /* Build the interface list, always placing the primary first */
> if (primary)
> cfg.if_id[num_ifs++] = primary->idx;
> -
> for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
> port_priv = ethsw->ports[i];
> if (port_priv == primary)
> @@ -2477,11 +2586,32 @@ static int dpaa2_switch_set_lag_cfg(struct net_device *bond_dev, u8 lag_id,
> lag->in_use = false;
> }
sashiko.dev notes:
Does this leak the dynamically allocated items in the lag->fdbs
list? When the last interface leaves a LAG, lag->in_use is set
to false here, but the elements in lag->fdbs are not freed.
The mechanism initiated by nbp_switchdev_unsync_objs() will remove all
the lag->fdbs automatically without the need to add extra code which
frees manually all entries.
>
> + /* When the primary changes, migrate the FDB entries from the old
> + * primary to the new one: remove them before reconfiguring the LAG in
> + * hardware and re-add them on the new primary afterwards. We do not
> + * touch any refcounting since the intention is to change the HW entry,
> + * not the parallel software tracking.
> + */
> + if (new_primary) {
> + mutex_lock(&lag->fdb_lock);
> + list_for_each_entry(a, &lag->fdbs, list)
> + dpaa2_switch_port_fdb_del(lag->primary, a->addr);
> + mutex_unlock(&lag->fdb_lock);
> + }
sashiko.dev notes:
Does dropping the fdb_lock here open a race window for leaking
FDB entries? While the lock is dropped, a concurrent
asynchronous work item could execute dpaa2_switch_lag_fdb_add(),
acquire fdb_lock, add a new FDB entry to the old lag->primary in
hardware, and append it to the software lag->fdbs list.
When this thread re-acquires the lock below, it will loop over
lag->fdbs and add all entries to the new primary. The new entry
would then be programmed on both the old and new primary ports.
Later, a delete operation would only remove it from the new
primary, permanently leaking the hardware entry on the old port.
The LAG primary changes only on bridge leave which will not run
concurrently with any workqueue events since there is a
flush_workqueue() in the pre bridge leave event.
> static void dpaa2_switch_event_work(struct work_struct *work)
> {
> struct ethsw_switchdev_event_work *switchdev_work =
> container_of(work, struct ethsw_switchdev_event_work, work);
> + struct net_device *orig_dev = switchdev_work->orig_dev;
> struct net_device *dev = switchdev_work->dev;
> + struct ethsw_port_priv *port_priv = netdev_priv(dev);
> struct switchdev_notifier_fdb_info *fdb_info;
> + struct dpaa2_switch_lag *lag;
> int err;
>
> fdb_info = &switchdev_work->fdb_info;
>
> + /* The lag structures are freed only from dpaa2_switch_remove(), which
> + * first flushes this workqueue, so the pointer stays valid for the
> + * lifetime of the work item. Only the dereference needs the RCU
> + * read-side lock; the FDB helpers below can sleep and must run outside
> + * of it.
> + */
> + rcu_read_lock();
> + lag = rcu_dereference(port_priv->lag);
> + rcu_read_unlock();
sashiko.dev notes:
Does this sequence leave lag unprotected against concurrent
teardown?
By dropping the RCU read lock here, a concurrent call to
dpaa2_switch_set_lag_cfg() could set lag->primary to NULL and
call synchronize_net(). Since the RCU read lock is dropped,
synchronize_net() would not wait for this work item to finish.
If the work item then resumes, dpaa2_switch_lag_fdb_add() reads
lag->primary locklessly:
dpaa2_switch_lag_fdb_add() {
...
struct ethsw_port_priv *port_priv = lag->primary;
err = dpaa2_switch_port_fdb_add(port_priv, addr);
This would pass a NULL port_priv to dpaa2_switch_port_fdb_add(),
potentially causing a NULL pointer dereference.
Again, the flush_workqueue() avoids these kind of concurrencies.
Ioana