Re: [PATCH net-next v2 09/13] dpaa2-switch: offload FDBs added on an upper bond device
From: Ioana Ciornei
Date: Thu May 14 2026 - 09:51:08 EST
On Thu, May 14, 2026 at 01:46:55PM +0300, Vladimir Oltean wrote:
> On Tue, May 12, 2026 at 04:15:50PM +0300, Ioana Ciornei wrote:
> > +static bool dpaa2_switch_foreign_dev_check(const struct net_device *dev,
> > + const struct net_device *foreign_dev)
> > +{
> > + struct ethsw_port_priv *port_priv = netdev_priv(dev);
> > + struct ethsw_core *ethsw = port_priv->ethsw_data;
> > + struct ethsw_port_priv *other_port;
> > + int i;
> > +
> > + if (netif_is_bridge_master(foreign_dev))
> > + if (port_priv->fdb->bridge_dev == foreign_dev)
> > + return false;
> > +
> > + if (netif_is_bridge_port(foreign_dev)) {
> > + for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
> > + other_port = ethsw->ports[i];
> > +
> > + if (!other_port || !other_port->lag)
> > + continue;
> > + if (other_port->lag->bond_dev == foreign_dev)
> > + return false;
> > + }
> > + }
> > +
> > + return true;
> > +}
>
> I noticed Sashiko says:
>
> Does this misclassify a non-LAG dpaa2 switch port as foreign? For a
> plain "bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p0 master static",
> br_switchdev_fdb_populate() sets info.dev = sw0p0 and
> __switchdev_handle_fdb_event_to_device() calls
> dpaa2_switch_port_fdb_event() with dev == orig_dev == sw0p0.
> Inside dpaa2_switch_foreign_dev_check(sw0p0, sw0p0):
> netif_is_bridge_master(sw0p0) is false
> netif_is_bridge_port(sw0p0) is true
> the loop only matches when other_port->lag->bond_dev == foreign_dev,
> but sw0p0 is not a bond_dev, so no entry matches
> so the function falls through to "return true" and the FDB event is
> dropped without ever calling dpaa2_switch_port_fdb_add(). Pre-patch the
> equivalent path only required dpaa2_switch_port_dev_check(dev) to be
> true and would have installed the entry.
> The new dpaa2_switch_port_offloads_bridge_port() helper added in
> dpaa2-switch.h covers both the LAG bond_dev case and the direct port
> netdev case:
> static inline bool
> dpaa2_switch_port_offloads_bridge_port(struct ethsw_port_priv *port_priv,
> const struct net_device *dev)
> {
> if (port_priv->lag && port_priv->lag->bond_dev == dev)
> return true;
> if (port_priv->netdev == dev)
> return true;
> return false;
> }
> Should dpaa2_switch_foreign_dev_check() also consult the per-port netdev
> match (e.g. via this helper) so non-LAG bridge ports aren't classified
> as foreign?
>
(...)
> It would be good, however, to do what the LLM says and fix this by
> saying that if netif_is_bridge_port(foreign_dev), the dpaa2-switch
> port %dev is foreign to it if there is no other_port under ethsw which
> has either a lag_dev _or_ a netdev equal to the foreign_dev. This way,
> the driver will be prepared when the switchdev_handle_fdb_event_to_device()
> algorithm changes.
Sure, will add that check also. I can directly use the
dpaa2_switch_port_offloads_bridge_port() function on other_port so
that it's more straight forward.
if (netif_is_bridge_port(foreign_dev)) {
for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
other_port = ethsw->ports[i];
if (!other_port)
continue;
if (dpaa2_switch_port_offloads_bridge_port(other_port,
foreign_dev))
return false;
}
}
>
> dpaa2_switch_port_event() is documented as called under rcu_read_lock().
> This loop reads other_port->lag and other_port->lag->bond_dev with no
> synchronization, while those fields are mutated under rtnl by
> dpaa2_switch_port_set_lag_group() and the bond-leave path
> (port_priv->lag = NULL; lag->bond_dev = NULL). Pointer reads are
> atomic, but the compound (other_port->lag) && (lag->bond_dev) is not.
> Is the foreign/native classification stable under concurrent bond
> join/leave, or should this rely on RTNL / netdev adjacency RCU lists
> instead?
>
> I think this is valid. DSA has closed this race with the dsa_flush_workqueue()
> call placed after switchdev_bridge_port_unoffload(). It ensures that
> pending FDB events are all processed while the netdev adjacency
> relationships have not yet been destroyed (running in prechangeupper
> context).
I think that what you describe is a slightly different issue than the one
outlined by the LLM. You are talking about a race between the workqueue
processing and a changeupper event while the LLM talks about a race
between the dpaa2_switch_foreign_dev_check() execution and a changeupper
event. This race happens before any work item was even queued.
If another port in the switch is leaving a bond device while we are
executing dpaa2_switch_foreign_dev_check(), then the worst case is that
we could hit a NULL pointer. To avoid that I can rewrite the
dpaa2_switch_port_offloads_bridge_port() as:
static inline bool
dpaa2_switch_port_offloads_bridge_port(struct ethsw_port_priv *port_priv,
const struct net_device *dev)
{
struct dpaa2_switch_lag *lag = port_priv->lag;
if (lag && lag->bond_dev == dev)
return true;
if (port_priv->netdev == dev)
return true;
return false;
}
This way, even if port_priv->lag is changed to NULL concurrently, we do
not end up with a NULL pointer dereference.
Any other lack of synchronization between the foreign_dev_check_cb and
the changeupper events is benign for dpaa2-switch since the driver does
not do anything with the foreign FDBs.