Re: [PATCH net] net/mlx5: Fix double unregister of HCA_PORTS component
From: Tariq Toukan
Date: Thu Dec 04 2025 - 03:27:27 EST
On 03/12/2025 23:10, Farhan Ali wrote:
On 12/2/2025 3:12 AM, Gerd Bayer wrote:
Clear hca_devcom_comp in device's private data after unregistering it in
LAG teardown. Otherwise a slightly lagging second pass through
mlx5_unload_one() might try to unregister it again and trip over
use-after-free.
On s390 almost all PCI level recovery events trigger two passes through
mxl5_unload_one() - one through the poll_health() method and one through
mlx5_pci_err_detected() as callback from generic PCI error recovery.
While testing PCI error recovery paths with more kernel debug features
enabled, this issue reproducibly led to kernel panics with the following
call chain:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 ESOP-2 FSI
Fault in home space mode while using kernel ASCE.
AS:00000000705c4007 R3:0000000000000024
Oops: 0038 ilc:3 [#1]SMP
CPU: 14 UID: 0 PID: 156 Comm: kmcheck Kdump: loaded Not tainted
6.18.0-20251130.rc7.git0.16131a59cab1.300.fc43.s390x+debug #1 PREEMPT
Krnl PSW : 0404e00180000000 0000020fc86aa1dc (__lock_acquire+0x5c/0x15f0)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 0000020f00000001 6b6b6b6b6b6b6c33 0000000000000000
0000000000000000 0000000000000000 0000000000000001 0000000000000000
0000000000000000 0000020fca28b820 0000000000000000 0000010a1ced8100
0000010a1ced8100 0000020fc9775068 0000018fce14f8b8 0000018fce14f7f8
Krnl Code: 0000020fc86aa1cc: e3b003400004 lg %r11,832
0000020fc86aa1d2: a7840211 brc 8,0000020fc86aa5f4
*0000020fc86aa1d6: c09000df0b25 larl %r9,0000020fca28b820
>0000020fc86aa1dc: d50790002000 clc 0(8,%r9),0(%r2)
0000020fc86aa1e2: a7840209 brc 8,0000020fc86aa5f4
0000020fc86aa1e6: c0e001100401 larl %r14,0000020fca8aa9e8
0000020fc86aa1ec: c01000e25a00 larl %r1,0000020fca2f55ec
0000020fc86aa1f2: a7eb00e8 aghi %r14,232
Call Trace:
__lock_acquire+0x5c/0x15f0
lock_acquire.part.0+0xf8/0x270
lock_acquire+0xb0/0x1b0
down_write+0x5a/0x250
mlx5_detach_device+0x42/0x110 [mlx5_core]
mlx5_unload_one_devl_locked+0x50/0xc0 [mlx5_core]
mlx5_unload_one+0x42/0x60 [mlx5_core]
mlx5_pci_err_detected+0x94/0x150 [mlx5_core]
zpci_event_attempt_error_recovery+0xcc/0x388
Fixes: 5a977b5833b7 ("net/mlx5: Lag, move devcom registration to LAG layer")
Signed-off-by: Gerd Bayer <gbayer@xxxxxxxxxxxxx>
---
Hi Shay et al,
while checking for potential regressions by Lukas Wunner's recent work
on pci_save/restore_state() for the recoverability of mlx5 functions I
consistently hit this bug. (Bjorn has queued this up for 6.19, according
to [0] and [1])
Apparently, the issue is unrelated to Lukas' work but can be reproduced
with master. It appears to be timing-sensitive, since it shows up only
when I use s390's debug_defconfig, but I think needs fixing anyhow, as
timing can change for other reasons, too.
I've spotted two additional places where the devcom reference is not
cleared after calling mlx5_devcom_unregister_component() in
drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c that I have not
addressed with a patch, since I'm unclear about how to test these
paths.
Thanks,
Gerd
[0] https://lore.kernel.org/all/cover.1760274044.git.lukas@xxxxxxxxx/
[1] https://lore.kernel.org/linux-pci/ cover.1763483367.git.lukas@xxxxxxxxx/
---
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/ drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 3db0387bf6dcb727a65df9d0253f242554af06db..8ec04a5f434dd4f717d6d556649fcc2a584db847 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1413,6 +1413,7 @@ static int __mlx5_lag_dev_add_mdev(struct mlx5_core_dev *dev)
static void mlx5_lag_unregister_hca_devcom_comp(struct mlx5_core_dev *dev)
{
mlx5_devcom_unregister_component(dev->priv.hca_devcom_comp);
+ dev->priv.hca_devcom_comp = NULL;
}
Though this fix looks correct to me in freeing hca_devcom_comp (not too familiar with mlx5 internals), I wonder if it would be better to just set devcom = NULL in devcom_free_comp_dev() after the kfree? This would also take care of other places where devcom is not set to NULL?
Setting NULL after the kfree will have no impact, it won't nullify the original field, but the function parameter copy (by-value).
devcom_free_comp_dev() and mlx5_devcom_unregister_component() get struct mlx5_devcom_comp_dev *devcom to work with, they can't nullify it for the caller context.
Thanks
Farhan