Re: [syzbot] KASAN: use-after-free Read in addr_handler (4)

From: Dmitry Vyukov
Date: Thu Sep 16 2021 - 10:47:18 EST


On Thu, 16 Sept 2021 at 16:45, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> > > > On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern..
> > > > > git tree: upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5
> > > > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+dc3dfba010d7671e05f5@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > >
> > > > #syz dup: KASAN: use-after-free Write in addr_resolve (2)
> > > >
> > > > Frankly, I still can't figure out how this is happening
> > > >
> > > > RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and
> > > > RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the
> > > > work touches.
> > > >
> > > > rdma_addr_cancel() is supposed to ensure that the work isn't and won't
> > > > run.
> > > >
> > > > So to hit this we have to either not call rdma_addr_cancel() when it
> > > > is need, or rdma_addr_cancel() has to be broken and continue to allow
> > > > the work.
> > > >
> > > > I could find nothing along either path, though rdma_addr_cancel()
> > > > relies on some complicated properties of the workqueues I'm not
> > > > entirely positive about.
> > >
> > > I stared at the code, but it's too complex to grasp it all entirely.
> > > There are definitely lots of tricky concurrent state transitions and
> > > potential for unexpected interleavings. My bet would be on some tricky
> > > hard-to-trigger thread interleaving.
> >
> > From a uapi perspective the entire thing is serialized with a mutex..
> >
> > > The only thing I can think of is adding more WARNINGs to the code to
> > > check more of these assumptions. But I don't know if there are any
> > > useful testable assumptions...
> >
> > Do you have any idea why we can't get a reproduction out of syzkaller
> > here?
> >
> > I feel less comfortable with syzkaller's debug output, can you give
> > some idea what it might be doing concurrently?
>
> It looks like a very hard to trigger race (few crashes, no reproducer,
> but KASAN reports look sensible). That's probably the reason syzkaller
> can't create a reproducer.
> From the log it looks like it was triggered by one of these programs
> below. But I tried to reproduce manually and had no success.
> We are currently doing some improvements to race triggering code in
> syzkaller, and may try to use this as a litmus test to see if
> syzkaller will do any better:
> https://github.com/google/syzkaller/issues/612#issuecomment-920961538
>
> Answering your question re what was running concurrently with what.
> Each of the syscalls in these programs can run up to 2 times and
> ultimately any of these calls can race with any. Potentially syzkaller
> can predict values kernel will return (e.g. id's) before kernel
> actually returned them. I guess this does not restrict search area for
> the bug a lot...
>
>
> 11:16:53 executing program 3:
> write$RDMA_USER_CM_CMD_CONNECT(0xffffffffffffffff,
> &(0x7f0000000280)={0x6, 0x118, 0xfa00, {{0xfffffff7, 0x6a492eae,
> "e0e55819482a40c1c535b72b0bc0bc5e4478995957e1d0fe2311a39ee3960d3488407d52fbef30809118fcbaef590c27d04918aa1348b409d45ba277d9f73bd18868a9c4fde7560288298bde7e9a96c1ef280ca62f4a6f591a2181f2e3d3cf52212fa5ae101aa1bf975763cef32e3a2c73b79d0af1d2e58b82243731e6082cab1cb1c643b7bbec2e6d45bca8a6980f148aaefb71f1933ffa50534b83267139b2324e51ffecb57959bf7e98b60516cebc8f05838a7976cef33b64410626c14dca7dcb22f0902aeb045b88656268a6dd922d6a0e7b7002e8ea90020650dced319050db3130089e5011994d90340a93088e0a8b03ea61ac3f53312342b3d6e038ae",
> 0xfc, 0xe1, 0xb2, 0xd0, 0x7, 0x40, 0x0, 0x1}}}, 0x120)
> r0 = openat$pfkey(0xffffffffffffff9c, &(0x7f00000001c0), 0x80800, 0x0)
> r1 = syz_io_uring_setup(0x1c7, &(0x7f0000000080)={0x0, 0x0, 0x0, 0x0,
> 0x7f, 0x0, r0}, &(0x7f00007b6000/0x2000)=nil,
> &(0x7f0000ffd000/0x3000)=nil, &(0x7f0000000140)=<r2=>0x0,
> &(0x7f0000000100)=<r3=>0x0)
> socketpair$unix(0x1, 0x5, 0x0, &(0x7f0000000040)={0xffffffffffffffff,
> <r4=>0xffffffffffffffff})
> r5 = openat(0xffffffffffffff9c,
> &(0x7f0000000000)='/proc/self/exe\x00', 0x0, 0x0)
> mmap(&(0x7f0000000000/0x800000)=nil, 0x800000, 0x0, 0x12, r5, 0x0)
> r6 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
> write$RDMA_USER_CM_CMD_CREATE_ID(r6, &(0x7f0000000080)={0x0, 0x18,
> 0xfa00, {0xffffffffffffffff,
> &(0x7f0000000000)={<r7=>0xffffffffffffffff}, 0x13f}}, 0x20)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000100)={0x3, 0x40,
> 0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r7}}, 0x48)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000180)={0x3, 0x40,
> 0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r7}},
> 0x48)
> write$RDMA_USER_CM_CMD_DESTROY_ID(r6, &(0x7f0000000280)={0x1, 0x10,
> 0xfa00, {&(0x7f0000000240), r7}}, 0x18)
> write$RDMA_USER_CM_CMD_CONNECT(r5, &(0x7f00000003c0)={0x6, 0x118,
> 0xfa00, {{0x9, 0x9,
> "f703ff619e427c1d7d50fc023c22feb64ea5083376891585a4a8b539bead7f61210a9010d88379b67ebe7a1fc77fbdd4dccaec4b498eafe4b08e7e5b28e9fe54606f87e9618b9ade4e28b66e04c73fe4660de33c075bb9b1a43c59e485dcc259fb21fed21380f9ec2c61e8d29b6069786e8bc3da0f3bded0acd13548d2d76af6e701a258307fbce30c0f452b6a25f39209c830fe557de6f1fb3fdfe4347be3a9fdfeaca47b97e333a266013beef7cb7d7ea746bca1d3a929747a269df24d019e3e413309e58095182dd5dc3c8a088e94abf8d5cd389749cc80e4e452c8dabe7eaadd8144e2c4392e35c1b5ad3369ee7b2f855e5ebe9bdc0e8a464e8a9e4f54c0",
> 0x2, 0xff, 0x1, 0x8f, 0x6, 0x3, 0x6}, r7}}, 0x120)
> syz_io_uring_submit(r2, r3,
> &(0x7f0000000180)=@IORING_OP_READ=@pass_buffer={0x16, 0x4, 0x0,
> @fd=r4, 0x0, &(0x7f0000000000)=""/7, 0x7}, 0x0)
> syz_io_uring_submit(r2, r3,
> &(0x7f0000002f80)=@IORING_OP_LINK_TIMEOUT={0xf, 0x0, 0x0, 0x0, 0x0,
> &(0x7f0000000240)={0x0, 0x3938700}}, 0x10000007)
> io_uring_enter(r1, 0x45f5, 0x0, 0x0, 0x0, 0xf5ff)
>
>
>
> 11:16:55 executing program 4:
> r0 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
> write$RDMA_USER_CM_CMD_CREATE_ID(r0, &(0x7f0000000080)={0x0, 0x18,
> 0xfa00, {0xffffffffffffffff,
> &(0x7f0000000000)={<r1=>0xffffffffffffffff}, 0x13f}}, 0x20)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000100)={0x3, 0x40,
> 0xfa00, {{0xa, 0xfffd}, {0xa, 0x0, 0x10000000, @ipv4={'\x00',
> '\xff\xff', @broadcast}, 0x3}, r1, 0xfffffffe}}, 0x48)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000180)={0x3, 0x40,
> 0xfa00, {{0xa, 0x2, 0x0, @ipv4={'\x00', '\xff\xff', @multicast2}},
> {0xa, 0x0, 0x0, @initdev={0xfe, 0x88, '\x00', 0x1, 0x0}}, r1}}, 0xd5)
> write$RDMA_USER_CM_CMD_DESTROY_ID(r0, &(0x7f0000000280)={0x1, 0x10,
> 0xfa00, {&(0x7f0000000240), r1}}, 0x18)
> write$RDMA_USER_CM_CMD_BIND_IP(0xffffffffffffffff,
> &(0x7f0000000000)={0x2, 0x28, 0xfa00, {0x0, {0xa, 0x4e23, 0x9,
> @ipv4={'\x00', '\xff\xff', @rand_addr=0x64010102}, 0x100}, r1}}, 0x30)
> openat$qrtrtun(0xffffffffffffff9c, &(0x7f0000002740), 0x101002)
> io_setup(0x8, &(0x7f0000000600)=<r2=>0x0)
> clock_getres(0xfffffffffffffffd, 0x0)
> r3 = openat$hwrng(0xffffffffffffff9c, &(0x7f0000000040), 0x400, 0x0)
> r4 = openat$vcsa(0xffffffffffffff9c, &(0x7f00000002c0), 0x8000, 0x0)
> r5 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
> write$RDMA_USER_CM_CMD_CREATE_ID(r5, &(0x7f0000000080)={0x0, 0x18,
> 0xfa00, {0xffffffffffffffff,
> &(0x7f0000000000)={<r6=>0xffffffffffffffff}, 0x13f}}, 0x20)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000100)={0x3, 0x40,
> 0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r6}}, 0x48)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000180)={0x3, 0x40,
> 0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r6}},
> 0x48)
> write$RDMA_USER_CM_CMD_DESTROY_ID(r5, &(0x7f0000000280)={0x1, 0x10,
> 0xfa00, {&(0x7f0000000240), r6}}, 0x18)
> pipe2(&(0x7f0000000340)={<r7=>0xffffffffffffffff}, 0x800)
> write$RDMA_USER_CM_CMD_RESOLVE_IP(r7, &(0x7f0000000380)={0x3, 0x40,
> 0xfa00, {{0xa, 0x4e23, 0x8, @remote, 0x3ff}, {0xa, 0x4e22, 0x8000,
> @initdev={0xfe, 0x88, '\x00', 0x1, 0x0}, 0x8}, r1, 0x1f}}, 0x48)
> write$RDMA_USER_CM_CMD_LISTEN(r4, &(0x7f0000000300)={0x7, 0x8, 0xfa00,
> {r6, 0x8}}, 0x10)
> io_submit(r2, 0x1, &(0x7f0000000200)=[&(0x7f00000000c0)={0x0, 0x0,
> 0x0, 0x1, 0x0, 0xffffffffffffffff,
> &(0x7f0000000400)="03a0a445bc5d7a9d6c", 0x9, 0x7fffffff, 0x0, 0x0,
> r3}])


I noticed we also had 2 KCSAN reports that mention rdma_resolve_addr.

On commit 1df0d896:
==================================================================
BUG: KCSAN: data-race in addr_handler / cma_check_port

write to 0xffff88809fa40a1c of 4 bytes by task 21 on cpu 1:
cma_comp_exch drivers/infiniband/core/cma.c:426 [inline]
addr_handler+0x9f/0x2b0 drivers/infiniband/core/cma.c:3141
process_one_req+0x22f/0x300 drivers/infiniband/core/addr.c:645
process_one_work+0x3e1/0x9a0 kernel/workqueue.c:2269
worker_thread+0x665/0xbe0 kernel/workqueue.c:2415
kthread+0x20d/0x230 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

read to 0xffff88809fa40a1c of 4 bytes by task 11997 on cpu 0:
cma_check_port+0xbd/0x700 drivers/infiniband/core/cma.c:3506
cma_use_port drivers/infiniband/core/cma.c:3541 [inline]
cma_get_port drivers/infiniband/core/cma.c:3623 [inline]
rdma_bind_addr+0x1639/0x1910 drivers/infiniband/core/cma.c:3741
cma_bind_addr drivers/infiniband/core/cma.c:3252 [inline]
rdma_resolve_addr+0x486/0x1240 drivers/infiniband/core/cma.c:3264
ucma_resolve_ip+0x121/0x1b0 drivers/infiniband/core/ucma.c:722
ucma_write+0x229/0x250 drivers/infiniband/core/ucma.c:1764
vfs_write+0x1d6/0x690 fs/read_write.c:576
ksys_write+0xce/0x180 fs/read_write.c:631
__do_sys_write fs/read_write.c:643 [inline]
__se_sys_write fs/read_write.c:640 [inline]
__x64_sys_write+0x3e/0x50 fs/read_write.c:640
do_syscall_64+0x51/0xb0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11997 Comm: syz-executor.4 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
==================================================================

and on commit 5863cc79:
==================================================================
BUG: KCSAN: data-race in cma_comp_exch / rdma_resolve_addr

write to 0xffff8880a73bda1c of 4 bytes by task 7740 on cpu 0:
cma_comp_exch+0x84/0xc0 drivers/infiniband/core/cma.c:441
addr_handler+0x80/0x2f0 drivers/infiniband/core/cma.c:3033
process_one_req+0xc2/0x3a0 drivers/infiniband/core/addr.c:644
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880a73bda1c of 4 bytes by task 30237 on cpu 1:
rdma_resolve_addr+0x56/0x10e0 drivers/infiniband/core/cma.c:3156
ucma_resolve_ip+0x105/0x180 drivers/infiniband/core/ucma.c:708
ucma_write+0x1fe/0x2a0 drivers/infiniband/core/ucma.c:1684
__vfs_write+0x67/0xc0 fs/read_write.c:494
vfs_write fs/read_write.c:558 [inline]
vfs_write+0x18a/0x390 fs/read_write.c:542
ksys_write+0x17b/0x1b0 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__x64_sys_write+0x4c/0x60 fs/read_write.c:620
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 30237 Comm: syz-executor.0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
==================================================================