Re: [PATCH rdma-next] RDMA/mlx5: Don't access NULL-cleared mpi pointer

From: Jason Gunthorpe
Date: Tue Jun 29 2021 - 19:06:40 EST


On Tue, Jun 29, 2021 at 11:51:38AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@xxxxxxxxxx>
>
> The "dev->port[i].mp.mpi" is set to NULL during mlx5_ib_unbind_slave_port()
> execution, however that field is needed to add device to unaffiliated list.
>
> Such flow causes to the following kernel panic while unloading mlx5_ib
> module in multi-port mode, hence the device should be added to the list
> prior to unbind call.
>
> RPC: Unregistered rdma transport module.
> RPC: Unregistered rdma backchannel transport module.
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] SMP NOPTI
> CPU: 4 PID: 1904 Comm: modprobe Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_24_12_08 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:mlx5_ib_cleanup_multiport_master+0x18b/0x2d0 [mlx5_ib]
> Code: 00 04 0f 85 c4 00 00 00 48 89 df e8 ef fa ff ff 48 8b 83 40 0d 00 00 48 8b 15 b9 e8 05 00 4a 8b 44 28 20 48 89 05 ad e8 05 00 <48> c7 00 d0 57 c5 a0 48 89 50 08 48 89 02 39 ab 88 0a 00 00 0f 86
> RSP: 0018:ffff888116ee3df8 EFLAGS: 00010296
> RAX: 0000000000000000 RBX: ffff8881154f6000 RCX: 0000000000000080
> RDX: ffffffffa0c557d0 RSI: ffff88810b69d200 RDI: 000000000002d8a0
> RBP: 0000000000000002 R08: ffff888110780408 R09: 0000000000000000
> R10: ffff88812452e1c0 R11: fffffffffff7e028 R12: 0000000000000000
> R13: 0000000000000080 R14: ffff888102c58000 R15: 0000000000000000
> FS: 00007f884393a740(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000001249f6004 CR4: 0000000000370ea0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> mlx5_ib_stage_init_cleanup+0x16/0xd0 [mlx5_ib]
> __mlx5_ib_remove+0x33/0x90 [mlx5_ib]
> mlx5r_remove+0x22/0x30 [mlx5_ib]
> auxiliary_bus_remove+0x18/0x30
> __device_release_driver+0x177/0x220
> driver_detach+0xc4/0x100
> bus_remove_driver+0x58/0xd0
> auxiliary_driver_unregister+0x12/0x20
> mlx5_ib_cleanup+0x13/0x897 [mlx5_ib]
> __x64_sys_delete_module+0x154/0x230
> ? exit_to_user_mode_prepare+0x104/0x140
> do_syscall_64+0x3f/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f8842e095c7
> Code: 73 01 c3 48 8b 0d d9 48 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 48 2c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc68f6e758 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 00005638207929c0 RCX: 00007f8842e095c7
> RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000563820792a28
> RBP: 00005638207929c0 R08: 00007ffc68f6d701 R09: 0000000000000000
> R10: 00007f8842e82880 R11: 0000000000000206 R12: 0000563820792a28
> R13: 0000000000000001 R14: 0000563820792a28 R15: 00007ffc68f6fb40
> Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm ib_umad mlx5_ib(-) mlx4_ib ib_uverbs ib_core mlx4_en mlx4_core mlx5_core ptp pps_core [last unloaded: rpcrdma]
> CR2: 0000000000000000
> ---[ end trace a0bb7e20804e9e9b ]---
>
> Fixes: 7ce6095e3bff ("RDMA/mlx5: Don't add slave port to unaffiliated list")
> Reviewed-by: Itay Aveksis <itayav@xxxxxxxxxx>
> Reviewed-by: Maor Gottlieb <maorg@xxxxxxxxxx>
> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx>
> ---
> This is fix the patch in the for-next.
> ---
> drivers/infiniband/hw/mlx5/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

Applied to for-next, thanks

Jason