[PATCH for v4.9 LTS 08/87] net/mlx5: Disable RoCE on the e-switch management port under switchdev mode

From: Levin, Alexander (Sasha Levin)
Date: Fri Jul 14 2017 - 21:26:51 EST

From: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>

[ Upstream commit 9da34cd34e85aacc55af8774b81b1f23e86014f9 ]

Under the switchdev/offloads mode, packets that don't match any
e-switch steering rule are sent towards the e-switch management
port. We use a NIC HW steering rule set per vport (uplink and VFs)
to make them be received into the host OS through the respective
vport representor netdevice.

Currnetly such missed RoCE packets will not get to this NIC steering
rule, and hence VF RoCE will not work over the slow path of the offloads
mode. This is b/c these packets will be matched by a steering rule added
by the firmware that serves RoCE traffic set on the PF NIC vport which
is also the e-switch management port under SRIOV.

Disabling RoCE on the e-switch management vport when we are in the offloads
mode, will signal to the firmware to remove their RoCE rule, and then the
missed RoCE packets will be matched by the representor NIC steering rule
as any other missed packets.

To achieve that, we disable RoCE on the PF vport. We do that by removing
(hot-unplugging) the IB device instance associated with the PF. This is
also required by our current model where the PF serves as the uplink
representor and hence only SW switching (TC, bridge, OVS) applications
and slow path vport mlx5e net-device should be running over that vport.

Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode changes')
Signed-off-by: Or Gerlitz <ogerlitz@xxxxxxxxxxxx>
Reviewed-by: Hadar Hen Zion <hadarh@xxxxxxxxxxxx>
Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Sasha Levin <alexander.levin@xxxxxxxxxxx>
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index b08b9e2c6a76..6ffd5d2a70aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -672,6 +672,12 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
if (err)
goto err_reps;
+ /* disable PF RoCE so missed packets don't go through RoCE steering */
+ mlx5_dev_list_lock();
+ mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+ mlx5_dev_list_unlock();
return 0;

@@ -695,6 +701,11 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw)
int err, err1, num_vfs = esw->dev->priv.sriov.num_vfs;

+ /* enable back PF RoCE */
+ mlx5_dev_list_lock();
+ mlx5_add_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
+ mlx5_dev_list_unlock();
err = mlx5_eswitch_enable_sriov(esw, num_vfs, SRIOV_LEGACY);
if (err) {