Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts

From: Qian Cai
Date: Wed Nov 24 2021 - 16:49:22 EST

Next message: Andrew Morton: "Re: [PATCH v3] mm: Defer kmemleak object creation of module_alloc()"
Previous message: Mingwei Zhang: "[PATCH 2/2] KVM: mmu/x86: optimize zapping by retaining non-leaf SPTEs and avoid rcu stall"
Next in thread: Qian Cai: "Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Nov 18, 2021 at 02:57:17PM -0600, Eric W. Biederman wrote:
> Qian Cai <quic_qiancai@xxxxxxxxxxx> writes:
>
> > On Thu, Nov 18, 2021 at 01:46:05PM -0600, Eric W. Biederman wrote:
> >> Is it possible? Yes it is possible. That is one place where
> >> a use-after-free has shown up and I expect would show up in the
> >> future.
> >>
> >> That said it is hard to believe there is still a user-after-free in the
> >> code. We spent the last kernel development cycle pouring through and
> >> correcting everything we saw until we ultimately found one very subtle
> >> use-after-free.
> >>
> >> If you have a reliable reproducer that you can share, we can look into
> >> this and see if we can track down where the reference count is going
> >> bad.
> >>
> >> It tends to take instrumenting the entire life cycle every increment and
> >> every decrement and then pouring through the logs to track down a
> >> use-after-free. Which is not something we can really do without a
> >> reproducer.
> >
> > The reproducer is just to run trinity by an unprivileged user on defconfig
> > with KASAN enabled (On linux-next, you can do "make defconfig debug.conf"
> > [1], but dont think other debugging options are relevent here.)
> >
> > $ trinity -C 31 -N 10000000
> >
> > It is always reproduced on an arm64 server here within 5-minute so far.
> > Some debugging progress so far. BTW, this could happen on user_shm_unlock()
> > path as well.
>
> Does this only happen on a single architecture? If so I wonder if
> perhaps some of the architectures atomic primitives are implemented
> improperly.

Hmm, I don't know if that or it is just this platfrom is lucky to trigger
the race condition quickly, but I can't reproduce it on x86 so far. I am
Cc'ing a few arm64 people to see if they have spot anything I might be
missing. The original bug report is here:

https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@xxxxxxxxxxxxx/

I did narrow it down the same traces were first introduced by those
commits:

d7c9e99aee48 Reimplement RLIMIT_MEMLOCK on top of ucounts
d64696905554 Reimplement RLIMIT_SIGPENDING on top of ucounts
6e52a9f0532f Reimplement RLIMIT_MSGQUEUE on top of ucounts
21d1c5e386bc Reimplement RLIMIT_NPROC on top of ucounts
b6c336528926 Use atomic_t for ucounts reference counting
905ae01c4ae2 Add a reference to ucounts for each cred
f9c82a4ea89c Increase size of ucounts to atomic_long_t

Also, I added a debugging patch here:

--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -847,8 +847,14 @@ int user_shm_lock(size_t size, struct ucounts *ucounts)

void user_shm_unlock(size_t size, struct ucounts *ucounts)
{
+ int i;
+
spin_lock(&shmlock_user_lock);
+ printk("KK user_shm_unlock ucounts = %d\n", atomic_read(&ucounts->count));
+ for (i = 0; i < UCOUNT_COUNTS; i++)
+ printk("KK type = %d, count = %ld\n", i, atomic_long_read(&ucounts->ucount[i]));
dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - 1) >> PAGE_SHIFT);
+ printk("size = %zu, count = %ld\n", size, atomic_long_read(&ucounts->ucount[UCOUNT_RLIMIT_MEMLOCK]));
spin_unlock(&shmlock_user_lock);
put_ucounts(ucounts)

Then, I noticed that ucounts->count is off-by-one. Since the later
put_ucounts() would free the "ucounts", I am wondering if it is actually
correct that "ucounts->count == 1" when entering user_shm_unlock(),
uncounts->ns has already gone. Thus, dec_rlimit_ucounts() should not
blindly traverse ucounts->ns ?

[ 214.541754] KK user_shm_unlock ucounts = 1
[ 214.545871] KK type = 0, count = 0
[ 214.549288] KK type = 1, count = 0
[ 214.552697] KK type = 2, count = 0
[ 214.556104] KK type = 3, count = 0
[ 214.559511] KK type = 4, count = 0
[ 214.562920] KK type = 5, count = 0
[ 214.566314] KK type = 6, count = 0
[ 214.569718] KK type = 7, count = 0
[ 214.573132] KK type = 8, count = 0
[ 214.576537] KK type = 9, count = 0
[ 214.579945] KK type = 10, count = 0
[ 214.583441] KK type = 11, count = 0
[ 214.586940] KK type = 12, count = 0
[ 214.590420] KK type = 13, count = 1
[ 214.593917] ==================================================================
[ 214.601130] BUG: KASAN: use-after-free in dec_rlimit_ucounts+0xe8/0xf0
[ 214.607657] Read of size 8 at addr ffff000905ee12f0 by task trinity-c2/9708
[ 214.614611]
[ 214.616093] CPU: 13 PID: 9708 Comm: trinity-c2 Not tainted 5.12.0-00007-gd7c9e99aee48-dirty #221
[ 214.624870] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
[ 214.632689] Call trace:
[ 214.635124] dump_backtrace+0x0/0x350
[ 214.638781] show_stack+0x18/0x28
[ 214.642088] dump_stack+0x120/0x18c
[ 214.645570] print_address_description.constprop.0+0x6c/0x30c
[ 214.651309] kasan_report+0x1d8/0x1f0
[ 214.654964] __asan_report_load8_noabort+0x34/0x60
[ 214.659747] dec_rlimit_ucounts+0xe8/0xf0
[ 214.663748] user_shm_unlock+0xdc/0x338
[ 214.667577] shmem_lock+0x154/0x250
[ 214.671057] shmctl_do_lock+0x310/0x5d8
[ 214.674886] ksys_shmctl.constprop.0+0x200/0x588
[ 214.679496] __arm64_sys_shmctl+0x6c/0xa0
[ 214.683497] el0_svc_common.constprop.0+0xe4/0x300
[ 214.688281] do_el0_svc+0x48/0xd0
[ 214.691587] el0_svc+0x24/0x38
[ 214.694633] el0_sync_handler+0xb0/0xb8
[ 214.698460] el0_sync+0x174/0x180
[ 214.701766]
[ 214.703247] Allocated by task 9392:
[ 214.706726] kasan_save_stack+0x28/0x58
[ 214.710555] __kasan_slab_alloc+0x88/0xa8
[ 214.714555] kmem_cache_alloc+0x190/0x5b0
[ 214.718555] create_user_ns+0x158/0xa60
[ 214.722384] unshare_userns+0x44/0xe0
[ 214.726038] ksys_unshare+0x23c/0x580
[ 214.729693] __arm64_sys_unshare+0x30/0x50
[ 214.733781] el0_svc_common.constprop.0+0xe4/0x300
[ 214.738564] do_el0_svc+0x48/0xd0
[ 214.741871] e
[ 214.752048] asan_set_track+0x28/0x40
[ 214.764227] kasan_set_free_info+0x28/0x50
[ 214.768314] __kasan_slab_free+0xd0/0x130
[ 214.772316] kmem_cache_free+0xb4/0x390
[ 214.776146] free_user_ns+0x108/0x2a8
[ 214.779802] process_one_work+0x684/0xfd0
[ 214.783804] worker_thread+0x314/0xc78
[ 214.787543] kthread+0x3a4/0x460
[ 214.790763] ret_from_fork+0x10/0x30
[ 214.794330]
[ 214.795811] Last potentially related work creation:
[ 214.800678] kasan_save_stack+0x28/0x58
[ 214.804505] kasan_record_aux_stack+0xc0/0xd8
[ 214.808853] insert_work+0x50/0x2f0
[ 214.812334] __queue_work+0x314/0xac8
[ 214.815988] queue_work_on+0x94/0xc8
[ 214.819555] __put_user_ns+0x3c/0x60
[ 214.823122] put_cred_rcu+0x208/0x2f8
[ 214.826775] rcu_core+0x734/0xf68
[ 214.830083] rcu_core_si+0x10/0x20
[ 214.833477] __do_softirq+0x28c/0x774
[ 214.837130]
[ 214.838610] The buggy address belongs to the object at ffff000905ee1110
[ 214.838610] which belongs to the cache user_namespace of size 600
[ 214.851378] The buggy address is located 480 bytes inside of
[ 214.851378] 600-byte region [ffff000905ee1110, ffff000905ee1368)
[ 214.863105] The buggy address belongs to the page:
[ 214.867886] page:000000000a048a0d refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x985ee0
[ 214.877271] head:000000000a048a0d order:3 compound_mapcount:0 compound_pincount:0
[ 214.884744] flags: 0xbfffc0000010200(slab|head)
[ 214.889270] raw: 0bfffc0000010200 dead000000000100 dead000000000122 ffff0008002a3180
[ 214.897003] raw: 0000000000000000 00000000802d002d 00000001ffffffff 0000000000000000
[ 214.904734] page dumped because: kasan: bad access detected
[ 214.910296]
[ 214.911776] Memory state around the buggy address:
[ 214.916557] ffff000905ee1180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 214.923769] ffff000905ee1200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 214.930981] >ffff000905ee1280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 214.938191] ^
[ 214.945056] ffff000905ee1300: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[ 214.952267] ffff000905ee1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 214.959477] ==================================================================
[ 214.967070] Disabling lock debugging due to kernel taint
[ 214.972398] size = 4096, count = 0

>
> Unfortunately I don't have any arm64 machines where I can easily test
> this.
>
> The call path you posted from user_shm_unlock is another path where
> a use-after-free has show up in the past.
>
> My blind guess would be that I made an implementation mistake in
> inc_rlimit_get_ucounts or dec_rlimit_put_ucounts but I can't see it
> right now.
>
> Eric
>
> > Call trace:
> > dec_rlimit_ucounts
> > user_shm_unlock
> > (inlined by) user_shm_unlock at mm/mlock.c:854
> > shmem_lock
> > shmctl_do_lock
> > ksys_shmctl.constprop.0
> > __arm64_sys_shmctl
> > invoke_syscall
> > el0_svc_common.constprop.0
> > do_el0_svc
> > el0_svc
> > el0t_64_sync_handler
> > el0t_64_sync
> >
> > I noticed in dec_rlimit_ucounts(), dec == 0 and type ==
> > UCOUNT_RLIMIT_MEMLOCK.
> >
> > [1] https://lore.kernel.org/lkml/20211115134754.7334-1-quic_qiancai@xxxxxxxxxxx/

Next message: Andrew Morton: "Re: [PATCH v3] mm: Defer kmemleak object creation of module_alloc()"
Previous message: Mingwei Zhang: "[PATCH 2/2] KVM: mmu/x86: optimize zapping by retaining non-leaf SPTEs and avoid rcu stall"
Next in thread: Qian Cai: "Re: BUG: KASAN: use-after-free in dec_rlimit_ucounts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]