use-after-free in srpt_enable_tpg()

From: lizhijian@xxxxxxxxxxx
Date: Mon Jun 27 2022 - 03:10:51 EST


Hey folks

Another use-after-free was detected. This could be happens if i frequently abort (Ctrl-c) the blktests during it was running and then cause below error.
After digging to the coredump, i found that's because userspace API /sys/kernel/config/target/srpt/5054:00ff:fec9:7674/ still exists though the kernel
had released the srpt resource. So if the blktests access this userspace API, use-after-free(panic) will happen.

And i also noticed that similar issue had been fixed but reverted(see below) later years ago

commit 9b64f7d0bb0a8b5987f265756a563384765c7378
Author: Bart Van Assche <bvanassche@xxxxxxx>
Date: Mon Sep 30 16:17:07 2019 -0700

RDMA/srpt: Postpone HCA removal until after configfs directory removal

A shortcoming of the SCSI target core is that it does not have an API
for removing tpg or wwn objects. Wait until these directories have been
removed before allowing HCA removal to finish.

See also Bart Van Assche, "Re: Why using configfs as the only interface is
wrong for a storage target", 2011-02-07
(https://www.spinics.net/lists/linux-scsi/msg50248.html).


commit 77cf98d4ec90e8c48592c6537cfc2281c58f7ac3
Author: Bart Van Assche <bvanassche@xxxxxxx>
Date:   Fri Nov 1 13:47:56 2019 -0700

    Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"

    Although the mentioned patch fixes a use-after-free bug, it introduces a
    hang during shutdown. Since the latter is worse, revert this patch.

    Link: https://lore.kernel.org/r/20191101204756.182162-1-bvanassche@xxxxxxx
    Reported-by: Honggang Li <honli@xxxxxxxxxx>
    Fixes: 9b64f7d0bb0a ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
    Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
    Acked-by: Honggang Li <honli@xxxxxxxxxx>
    Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>


So far, I doubt if the previous defect of configfs mentioned in 9b64f7d0b: "(RDMA/srpt: Postpone HCA removal until after configfs directory removal)"
has got a better solution. if not, i have no a clear mechanism to avoid it yet.

feedbacks are very welcome.

[  103.202207] ==================================================================
[  103.202213] BUG: KASAN: use-after-free in srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202230] Read of size 8 at addr ffff888107d3d120 by task check/1370
[  103.202233]
[  103.202234] CPU: 0 PID: 1370 Comm: check Not tainted 5.19.0-rc1-roce-flush+ #85
[  103.202238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[  103.202241] Call Trace:
[  103.202246]  <TASK>
[  103.202247]  dump_stack_lvl+0x34/0x44
[  103.202263]  print_report.cold+0x5e/0x5db
[  103.202277]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202287]  kasan_report+0xab/0x120
[  103.202297]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202307]  srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202318]  target_fabric_tpg_base_enable_store+0xcc/0x130 [target_core_mod]
[  103.202364]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.202408]  ? rwsem_down_read_slowpath+0x730/0x730
[  103.202417]  ? policy_nodemask+0x14/0xa0
[  103.202423]  ? policy_node+0x59/0x80
[  103.202427]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.202471]  configfs_write_iter+0x15e/0x1e0
[  103.202482]  new_sync_write+0x206/0x310
[  103.202488]  ? new_sync_read+0x300/0x300
[  103.202493]  ? avc_policy_seqno+0x1d/0x30
[  103.202500]  ? selinux_file_permission+0x18b/0x1c0
[  103.202506]  vfs_write+0x33f/0x3e0
[  103.202510]  ksys_write+0xb4/0x150
[  103.202514]  ? __ia32_sys_read+0x40/0x40
[  103.202519]  do_syscall_64+0x3b/0x90
[  103.202524]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.202529] RIP: 0033:0x7f40fc2017a7
[  103.202534] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  103.202537] RSP: 002b:00007fff2c851918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  103.202545] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f40fc2017a7
[  103.202548] RDX: 0000000000000002 RSI: 00005577dfb42ab0 RDI: 0000000000000001
[  103.202550] RBP: 00005577dfb42ab0 R08: 000000000000000a R09: 00007f40fc2974e0
[  103.202552] R10: 00007f40fc2973e0 R11: 0000000000000246 R12: 0000000000000002
[  103.202556] R13: 00007f40fc2d4520 R14: 0000000000000002 R15: 00007f40fc2d4700
[  103.202559]  </TASK>
[  103.202562]
[  103.202565] Allocated by task 832((efault)):
[  103.202568]  kasan_save_stack+0x1e/0x40
[  103.202572]  __kasan_kmalloc+0x81/0xa0
[  103.202575]  srpt_add_one+0x3b/0x60 [ib_srpt]
[  103.202584]  add_client_context+0x23b/0x300 [ib_core]
[  103.202645]  ib_register_client+0x1d5/0x220 [ib_core]
[  103.202703]  0xffffffffc0878076
[  103.202705]  do_one_initcall+0x84/0x280
[  103.202711]  do_init_module+0xdf/0x2e0
[  103.202719]  load_module+0x2b9e/0x2c90
[  103.202722]  __do_sys_finit_module+0x111/0x190
[  103.202725]  do_syscall_64+0x3b/0x90
[  103.202728]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.202732]
[  103.202733] Freed by task 1307((efault)):
[  103.202736]  kasan_save_stack+0x1e/0x40
[  103.202739]  kasan_set_track+0x21/0x30
[  103.202742]  kasan_set_free_info+0x20/0x30
[  103.202746]  __kasan_slab_free+0x108/0x170
[  103.202749]  kfree+0x9a/0x320
[  103.202752]  srpt_remove_one.cold+0x23/0x1e1 [ib_srpt]
[  103.202762]  remove_client_context+0x8f/0xd0 [ib_core]
[  103.202819]  disable_device+0xee/0x1e0 [ib_core]
[  103.202877]  __ib_unregister_device+0x59/0xf0 [ib_core]
[  103.202935]  ib_unregister_device_and_put+0x3b/0x50 [ib_core]
[  103.202993]  nldev_dellink+0x126/0x1b0 [ib_core]
[  103.203120]  rdma_nl_rcv_msg+0x1cc/0x310 [ib_core]
[  103.203179]  rdma_nl_rcv+0x172/0x200 [ib_core]
[  103.203238]  netlink_unicast+0x36b/0x4a0
[  103.203247]  netlink_sendmsg+0x3a9/0x6d0
[  103.203250]  sock_sendmsg+0x91/0xa0
[  103.203256]  __sys_sendto+0x16f/0x210
[  103.203258]  __x64_sys_sendto+0x6f/0x80
[  103.203261]  do_syscall_64+0x3b/0x90
[  103.203264]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.203268]
[  103.203269] The buggy address belongs to the object at ffff888107d3d000
                which belongs to the cache kmalloc-2k of size 2048
[  103.203272] The buggy address is located 288 bytes inside of
                2048-byte region [ffff888107d3d000, ffff888107d3d800)
[  103.203275]

[  103.203275] The buggy address belongs to the physical page:
[  103.203277] page:00000000fcc35344 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888107d39000 pfn:0x107d38
[  103.203282] head:00000000fcc35344 order:3 compound_mapcount:0 compound_pincount:0
[  103.203285] flags: 0x100000000010200(slab|head|node=0|zone=2)
[  103.203290] raw: 0100000000010200 ffffea0004105e08 ffffea00040a8e08 ffff888100042000
[  103.203293] raw: ffff888107d39000 0000000000080005 00000001ffffffff 0000000000000000
[  103.203295] page dumped because: kasan: bad access detected
[  103.203296]
[  103.203296] Memory state around the buggy address:
[  103.203298]  ffff888107d3d000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203300]  ffff888107d3d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203302] >ffff888107d3d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203304]                                ^
[  103.203306]  ffff888107d3d180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203308]  ffff888107d3d200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203311] ==================================================================
[  103.203386] Kernel panic - not syncing: kasan.fault=panic set ...
[  103.318885] CPU: 0 PID: 1370 Comm: check Not tainted 5.19.0-rc1-roce-flush+ #85
[  103.320272] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[  103.322308] Call Trace:
[  103.322910]  <TASK>
[  103.323542]  dump_stack_lvl+0x34/0x44
[  103.324321]  panic+0x19a/0x345
[  103.324981]  ? panic_print_sys_info.part.0+0x5b/0x5b
[  103.325893]  ? _raw_spin_unlock_irqrestore+0xd/0x40
[  103.326832]  ? preempt_count_sub+0xf/0xb0
[  103.327644]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.328523]  end_report.part.0+0x54/0x69
[  103.329320]  kasan_report+0xba/0x120
[  103.330124]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.331163]  srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.332083]  target_fabric_tpg_base_enable_store+0xcc/0x130 [target_core_mod]
[  103.333376]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.334878]  ? rwsem_down_read_slowpath+0x730/0x730
[  103.335808]  ? policy_nodemask+0x14/0xa0
[  103.336605]  ? policy_node+0x59/0x80
[  103.337378]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.338875]  configfs_write_iter+0x15e/0x1e0
[  103.339733]  new_sync_write+0x206/0x310
[  103.340538]  ? new_sync_read+0x300/0x300
[  103.341347]  ? avc_policy_seqno+0x1d/0x30
[  103.342147]  ? selinux_file_permission+0x18b/0x1c0
[  103.344176]  vfs_write+0x33f/0x3e0
[  103.345204]  ksys_write+0xb4/0x150
[  103.345936]  ? __ia32_sys_read+0x40/0x40
[  103.346774]  do_syscall_64+0x3b/0x90
[  103.347548]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.348460] RIP: 0033:0x7f40fc2017a7
[  103.349192] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  103.352189] RSP: 002b:00007fff2c851918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  103.353619] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f40fc2017a7
[  103.354857] RDX: 0000000000000002 RSI: 00005577dfb42ab0 RDI: 0000000000000001
[  103.356137] RBP: 00005577dfb42ab0 R08: 000000000000000a R09: 00007f40fc2974e0
[  103.357520] R10: 00007f40fc2973e0 R11: 0000000000000246 R12: 0000000000000002
[  103.358756] R13: 00007f40fc2d4520 R14: 0000000000000002 R15: 00007f40fc2d4700
[  103.360071]  </TASK>
[  103.360894] Kernel Offset: 0x30e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  103.362768] ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]---
(END)

Thanks
Zhijian