[PATCH net-next V2 00/13] net/mlx5: Add switchdev mode support for Socket Direct single netdev, part 1/2

From: Tariq Toukan

Date: Sun May 31 2026 - 07:40:41 EST


Hi,

This series enables Socket Direct single netdev to operate in switchdev
mode with shared FDB. See detailed feature description by Shay below.

Regards,
Tariq


This series enables Socket Direct single netdev to operate in switchdev
mode with shared FDB. SD single netdev combines multiple PCI functions
behind a single netdev interface. To support switchdev offloads, these
functions must participate in virtual LAG (shared FDB).

Design

Rather than introducing a separate LAG instance for SD, this series
integrates SD secondary devices into the existing LAG structure
(priv.lag) created at probe time. Each lag_func entry carries a
group_id field that identifies its SD group membership (0 means not
part of any SD group). An xarray mark (XA_MARK_PORT) distinguishes
physical port entries from SD secondaries, enabling a single unified
iterator that filters by group:

- MLX5_LAG_FILTER_PORTS: iterate port-level entries only (existing
behavior, used by bonding, FW LAG commands, v2p_map)
- MLX5_LAG_FILTER_ALL: iterate all devices including SD secondaries
(used by MPESW shared FDB across all devices)
- specific group_id: iterate only devices in that SD group (used by
per-group SD shared FDB operations)

Existing callers use mlx5_ldev_for_each() which maps to
MLX5_LAG_FILTER_PORTS, preserving current behavior for non-SD
configurations.

Lifecycle and ownership

The SD LAG lifecycle is tied to the SD group, not to bonding events:

1. At PCI probe, mlx5_lag_add_mdev() creates the LAG structure
(priv.lag) for each LAG-capable PF. e.g.: SD primary devices

2. During mlx5_sd_init(), after the SD group is fully formed (primary
and secondaries paired), sd_lag_init() registers the secondary
devices into the primary's existing priv.lag by calling
mlx5_ldev_add_mdev() with the SD group_id. The primary's lag_func
also gets its group_id set. No separate LAG instance is created.

3. After all the devices in SD group transition to switchdev,
mlx5_lag_shared_fdb_create() is invoked with the group_id to create
a software-only shared FDB scoped to that SD group. This sets
sd_fdb_active on all lag_func entries in the group. No FW LAG
commands are issued since SD devices share the same physical port.

4. If MPESW (multi-port eswitch) is enabled on top of SD groups, the
per-group SD shared FDB is torn down first, then MPESW shared FDB is
created spanning all devices (ports + SD secondaries) using
MLX5_LAG_FILTER_ALL. On MPESW disable, per-group SD shared FDB is
restored.

5. On SD teardown (mlx5_sd_cleanup or device unbind), sd_lag_cleanup()
removes secondaries from priv.lag and clears the primary's group_id.
The LAG structure itself is not destroyed.

The sd_fdb_active flag is set on all lag_func entries in a group (not
just the primary), so any device can detect the SD shared FDB state
during lag_disable_change teardown without needing to look up peer
entries.

SD shared FDB is a pure software construct -- unlike regular LAG modes
(ROCE, SRIOV, MPESW), it does not issue FW create_lag/destroy_lag
commands. The software vport LAG for SD is implemented via eswitch
egress ACL bounce rules, managed by the IB layer through
mlx5_eth_lag_init(). And the software LAG demux is implemented via
steering rules that utilize new destination, VHCA_RX.

Patches

Infrastructure (patches 1, 5-6):
- Factor out shared FDB code into a dedicated file
- Extend lag_func with group_id and sd_fdb_active fields;
add XA_MARK_PORT and unified iterator with group_id filter
- Extend shared FDB API with group_id parameter

E-Switch preparation (patches 2-3):
- Align eswitch disable sequence ordering
- Move devcom init from TC to eswitch layer

SD group management (patches 4, 7-9):
- Replace peer count check with direct peer lookup
- Register SD secondaries in the existing LAG at SD init time
- Block RoCE and VF LAG for SD devices
- Block multipath LAG for SD devices

Switchdev integration (patch 10):
- Keep netdev resources local in switchdev mode

Steering (patches 11-12):
- Track peer flow slots with bitmap for selective peer flow deletion
- Enable TC flow steering for SD LAG

Enablement (patch 13):
- Verify unique vhca_id count for cross-VHCA RQT

V2:
- Fix kdoc warning in mlx5_lag_shared_fdb_create()

V1:
https://lore.kernel.org/all/20260527125427.385976-1-tariqt@xxxxxxxxxx/

Shay Drory (13):
net/mlx5: LAG, factor out shared FDB code into dedicated file
net/mlx5: E-Switch, align disable sequence with switchdev-to-legacy
transition
net/mlx5: E-Switch, move devcom init from TC to eswitch layer
net/mlx5: LAG, replace peer count check with direct peer lookup
net/mlx5: LAG, prepare for SD device integration
net/mlx5: LAG, extend shared FDB API with group_id filter
net/mlx5: SD, introduce Socket Direct LAG
net/mlx5: LAG, block RoCE and VF LAG for SD devices
net/mlx5: LAG, block multipath LAG for SD devices
net/mlx5: SD, keep netdev resources on same PF in switchdev mode
net/mlx5e: TC, track peer flow slots with bitmap
net/mlx5e: TC, enable steering for SD LAG
net/mlx5e: Verify unique vhca_id count instead of range

.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
.../net/ethernet/mellanox/mlx5/core/en/rqt.c | 27 +-
.../ethernet/mellanox/mlx5/core/en/tc_priv.h | 7 +
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 83 ++--
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 11 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 26 ++
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 429 ++++++++++--------
.../net/ethernet/mellanox/mlx5/core/lag/lag.h | 100 +++-
.../net/ethernet/mellanox/mlx5/core/lag/mp.c | 4 +
.../ethernet/mellanox/mlx5/core/lag/mpesw.c | 28 +-
.../mellanox/mlx5/core/lag/shared_fdb.c | 235 ++++++++++
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 227 +++++++--
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 23 +
.../net/ethernet/mellanox/mlx5/core/main.c | 3 +-
14 files changed, 916 insertions(+), 289 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lag/shared_fdb.c


base-commit: 8415598365503ced2e3d019491b0a2756c85c494
--
2.44.0