[PATCH 5.7 096/166] net/mlx5e: Fix CPU mapping after function reload to avoid aRFS RX crash

From: Greg Kroah-Hartman
Date: Tue Jul 14 2020 - 15:02:22 EST


From: Aya Levin <ayal@xxxxxxxxxxxx>

[ Upstream commit f4aebbfb56ed0c186adbeb2799df836da50f78e3 ]

After function reload, CPU mapping used by aRFS RX is broken, leading to
a kernel panic. Fix by moving initialization of rx_cpu_rmap from
netdev_init to netdev_attach. IRQ table is re-allocated on mlx5_load,
but netdev is not re-initialize.

Trace of the panic:
[ 22.055672] general protection fault, probably for non-canonical address 0x785634120000ff1c: 0000 [#1] SMP PTI
[ 22.065010] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.7.0-rc2-for-upstream-perf-2020-04-21_16-34-03-31 #1
[ 22.067967] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 22.071174] RIP: 0010:get_rps_cpu+0x267/0x300
[ 22.075692] RSP: 0018:ffffc90000244d60 EFLAGS: 00010202
[ 22.076888] RAX: ffff888459b0e400 RBX: 0000000000000000 RCX:0000000000000007
[ 22.078364] RDX: 0000000000008884 RSI: ffff888467cb5b00 RDI:0000000000000000
[ 22.079815] RBP: 00000000ff342b27 R08: 0000000000000007 R09:0000000000000003
[ 22.081289] R10: ffffffffffffffff R11: 00000000000070cc R12:ffff888454900000
[ 22.082767] R13: ffffc90000e5a950 R14: ffffc90000244dc0 R15:0000000000000007
[ 22.084190] FS: 0000000000000000(0000) GS:ffff88846fc80000(0000)knlGS:0000000000000000
[ 22.086161] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.087427] CR2: ffffffffffffffff CR3: 0000000464426003 CR4:0000000000760ee0
[ 22.088888] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
[ 22.090336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
[ 22.091764] PKRU: 55555554
[ 22.092618] Call Trace:
[ 22.093442] <IRQ>
[ 22.094211] ? kvm_clock_get_cycles+0xd/0x10
[ 22.095272] netif_receive_skb_list_internal+0x258/0x2a0
[ 22.096460] gro_normal_list.part.137+0x19/0x40
[ 22.097547] napi_complete_done+0xc6/0x110
[ 22.098685] mlx5e_napi_poll+0x190/0x670 [mlx5_core]
[ 22.099859] net_rx_action+0x2a0/0x400
[ 22.100848] __do_softirq+0xd8/0x2a8
[ 22.101829] irq_exit+0xa5/0xb0
[ 22.102750] do_IRQ+0x52/0xd0
[ 22.103654] common_interrupt+0xf/0xf
[ 22.104641] </IRQ>

Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
Signed-off-by: Aya Levin <ayal@xxxxxxxxxxxx>
Reviewed-by: Eran Ben Elisha <eranbe@xxxxxxxxxxxx>
Signed-off-by: Saeed Mahameed <saeedm@xxxxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 02f6b6bd2847c..bc54913c58618 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5119,6 +5119,10 @@ static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
if (err)
goto err_destroy_flow_steering;

+#ifdef CONFIG_MLX5_EN_ARFS
+ priv->netdev->rx_cpu_rmap = mlx5_eq_table_get_rmap(priv->mdev);
+#endif
+
return 0;

err_destroy_flow_steering:
@@ -5296,10 +5300,6 @@ int mlx5e_netdev_init(struct net_device *netdev,
/* netdev init */
netif_carrier_off(netdev);

-#ifdef CONFIG_MLX5_EN_ARFS
- netdev->rx_cpu_rmap = mlx5_eq_table_get_rmap(mdev);
-#endif
-
return 0;

err_free_cpumask:
--
2.25.1