Re: KASAN: use-after-free Read in addr_handler (2)

From: Jason Gunthorpe
Date: Fri Jun 26 2020 - 20:45:28 EST


On Sun, Jun 14, 2020 at 04:53:21PM +0800, Hillf Danton wrote:
>
> Wed, 10 Jun 2020 10:02:11 -0700
> > syzbot found the following crash on:
> >
> > HEAD commit: 7ae77150 Merge tag 'powerpc-5.8-1' of git://git.kernel.org..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16c0d3a6100000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=d195fe572fb15312
> > dashboard link: https://syzkaller.appspot.com/bug?extid=a929647172775e335941
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+a929647172775e335941@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:938 [inline]
> > BUG: KASAN: use-after-free in __mutex_lock+0x1033/0x13c0 kernel/locking/mutex.c:1103
> > Read of size 8 at addr ffff888088ec33b0 by task kworker/u4:5/14014
> >
> > CPU: 1 PID: 14014 Comm: kworker/u4:5 Not tainted 5.7.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: ib_addr process_one_req
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x188/0x20d lib/dump_stack.c:118
> > print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
> > __kasan_report mm/kasan/report.c:513 [inline]
> > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
> > __mutex_lock_common kernel/locking/mutex.c:938 [inline]
> > __mutex_lock+0x1033/0x13c0 kernel/locking/mutex.c:1103
> > addr_handler+0xa0/0x340 drivers/infiniband/core/cma.c:3100
> > process_one_req+0xfa/0x680 drivers/infiniband/core/addr.c:643
> > process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
> > worker_thread+0x96/0xe20 kernel/workqueue.c:2414
> > kthread+0x388/0x470 kernel/kthread.c:268
> > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:351
> >
> > Allocated by task 31499:
> > save_stack+0x1b/0x40 mm/kasan/common.c:48
> > set_track mm/kasan/common.c:56 [inline]
> > __kasan_kmalloc mm/kasan/common.c:494 [inline]
> > __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
> > kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
> > kmalloc include/linux/slab.h:555 [inline]
> > kzalloc include/linux/slab.h:669 [inline]
> > __rdma_create_id+0x5b/0x850 drivers/infiniband/core/cma.c:861
> > ucma_create_id+0x1d1/0x590 drivers/infiniband/core/ucma.c:503
> > ucma_write+0x285/0x350 drivers/infiniband/core/ucma.c:1729
> > __vfs_write+0x76/0x100 fs/read_write.c:495
> > vfs_write+0x268/0x5d0 fs/read_write.c:559
> > ksys_write+0x1ee/0x250 fs/read_write.c:612
> > do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
> > entry_SYSCALL_64_after_hwframe+0x49/0xb3
> >
> > Freed by task 31496:
> > save_stack+0x1b/0x40 mm/kasan/common.c:48
> > set_track mm/kasan/common.c:56 [inline]
> > kasan_set_free_info mm/kasan/common.c:316 [inline]
> > __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
> > __cache_free mm/slab.c:3426 [inline]
> > kfree+0x109/0x2b0 mm/slab.c:3757
> > ucma_close+0x111/0x300 drivers/infiniband/core/ucma.c:1807
> > __fput+0x33e/0x880 fs/file_table.c:281
> > task_work_run+0xf4/0x1b0 kernel/task_work.c:123
> > tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> > exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
> > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> > syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
> > do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
> > entry_SYSCALL_64_after_hwframe+0x49/0xb3
> >
> > The buggy address belongs to the object at ffff888088ec3000
> > which belongs to the cache kmalloc-2k of size 2048
> > The buggy address is located 944 bytes inside of
> > 2048-byte region [ffff888088ec3000, ffff888088ec3800)
> > The buggy address belongs to the page:
> > page:ffffea000223b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
> > flags: 0xfffe0000000200(slab)
> > raw: 00fffe0000000200 ffffea000299f588 ffffea000263d0c8 ffff8880aa000e00
> > raw: 0000000000000000 ffff888088ec3000 0000000100000001 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> > ffff888088ec3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff888088ec3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >ffff888088ec3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ^
> > ffff888088ec3400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff888088ec3480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ==================================================================
>
> Add extra grab to id_priv to make addr_handler() safe.
> It may also fix what's
> Reported-by: syzbot+08092148130652a6faae@xxxxxxxxxxxxxxxxxxxxxxxxx

In some way adding the refcounting is appealing, but it is not quite
right..

Once rdma_resolve_addr() is called the cm_id state is supposed to go
to RDMA_CM_ADDR_QUERY and stay there until the address is resolved.

The first thing rdma_destroy_id() does (which is what triggered the
kfree) is to call cma_cancel_operation(), which does a
cancel_delayed_work_sync().

So, to hit this syzkaller one of these must have happened:
1) rdma_addr_cancel() didn't work and the process_one_work() is still
runnable/running
2) The state changed away from RDMA_CM_ADDR_QUERY without doing
rdma_addr_cancel()
3) rdma_resolve_addr() ran concurrently with rdma_destroy_id()

I think #1 and #3 are not likely as I've already fixed those in the
past, but maybe you can see something?

So, this is probably #2. Actually it looks like there are many cases
where #2 is possible, so lets start there.

Something sort of like this should help: