Re: [RFC PATCH] RDMA/srp: Fix use-after-free in srp_exit_cmd_priv

From: Jason Gunthorpe
Date: Fri Jun 24 2022 - 18:59:18 EST


On Fri, Jun 24, 2022 at 12:02:53PM +0800, Li Zhijian wrote:
> srp_exit_cmd_priv() will try to access srp_device by Scsi_Host like below:
>
> Scsi_Host srp_target_port srp_host srp_device
> +------------------+ +-- +--------------+ +>+----------+ +->+---------+
> | | | | | | | | | | |
> | | | | *srp_host +--+ | *srp_dev +---+ | *dev |
> +-+hostdata--------+-+ | | | | | |
> | | srp_target_port| | | | | | |
> | | | | | | | | |
> | | | | | | | | |
> +-+----------------+---- +--------------+ +----------+ +---------+
>
> But sometims Scsi_Host still keeps the reference to srp_host that is
> possible released already. This could be happend if i frequently abort
> (Ctrl-c) the blktests during it was running and then cause below error:
>
> [ 952.299153] Freed by task 17289:
> [ 952.299156] kasan_save_stack+0x1e/0x40
> [ 952.299160] kasan_set_track+0x21/0x30
> [ 952.299164] kasan_set_free_info+0x20/0x30
> [ 952.299169] __kasan_slab_free+0x108/0x170
> [ 952.299173] kfree+0x9a/0x320
> [ 952.299177] srp_remove_one+0x114/0x180 [ib_srp]
> [ 952.299189] remove_client_context+0x8f/0xd0 [ib_core]
> [ 952.299269] disable_device+0xee/0x1e0 [ib_core]
> [ 952.299348] __ib_unregister_device+0x59/0xf0 [ib_core]
> [ 952.299429] ib_unregister_device_and_put+0x3b/0x50 [ib_core]
> [ 952.299509] nldev_dellink+0x126/0x1b0 [ib_core]
> [ 952.299592] rdma_nl_rcv_msg+0x1cc/0x310 [ib_core]
> [ 952.299673] rdma_nl_rcv+0x172/0x200 [ib_core]
> [ 952.299760] netlink_unicast+0x36b/0x4a0
> [ 952.299770] netlink_sendmsg+0x3a9/0x6d0
> [ 952.299774] sock_sendmsg+0x91/0xa0
> [ 952.299783] __sys_sendto+0x16f/0x210
> [ 952.299788] __x64_sys_sendto+0x6f/0x80
> [ 952.299792] do_syscall_64+0x3b/0x90
> [ 952.299795] entry_SYSCALL_64_after_hwframe+0x46/0xb0

I don't even understand how get_device() prevents this call chain??

It looks to me like the problem is srp_remove_one() is not waiting for
or canceling some outstanding work.

Jason