Re: [PATCH v5] aio: Add support for the POLLFREE

From: Eric Biggers
Date: Tue Nov 23 2021 - 18:23:53 EST


On Tue, Nov 23, 2021 at 11:49:54AM -0800, Eric Biggers wrote:
> On Wed, Oct 27, 2021 at 01:18:34AM +0000, Ramji Jiyani wrote:
> > Add support for the POLLFREE flag to force complete iocb inline in
> > aio_poll_wake(). A thread may use it to signal it's exit and/or request
> > to cleanup while pending poll request. In this case, aio_poll_wake()
> > needs to make sure it doesn't keep any reference to the queue entry
> > before returning from wake to avoid possible use after free via
> > poll_cancel() path.
> >
> > UAF issue was found during binder and aio interactions in certain
> > sequence of events [1].
> >
> > The POLLFREE flag is no more exclusive to the epoll and is being
> > shared with the aio. Remove comment from poll.h to avoid confusion.
> >
> > [1] https://lore.kernel.org/r/CAKUd0B_TCXRY4h1hTztfwWbNSFQqsudDLn2S_28csgWZmZAG3Q@xxxxxxxxxxxxxx/
> >
> > Fixes: af5c72b1fc7a ("Fix aio_poll() races")
> > Signed-off-by: Ramji Jiyani <ramjiyani@xxxxxxxxxx>
> > Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> > Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # 4.19+
> > ---
>
> Looks good, feel free to add:
>
> Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx>
>
> I'm still not 100% happy with the commit message, but it's good enough.
> The actual code looks correct.
>
> Who is going to take this patch? This is an important fix; it shouldn't be
> sitting ignored for months. get_maintainer.pl shows:
>
> $ ./scripts/get_maintainer.pl fs/aio.c
> Benjamin LaHaise <bcrl@xxxxxxxxx> (supporter:AIO)
> Alexander Viro <viro@xxxxxxxxxxxxxxxxxx> (maintainer:FILESYSTEMS (VFS and infrastructure))
> linux-aio@xxxxxxxxx (open list:AIO)
> linux-fsdevel@xxxxxxxxxxxxxxx (open list:FILESYSTEMS (VFS and infrastructure))
> linux-kernel@xxxxxxxxxxxxxxx (open list)

Actually, there is a bug in this patch -- it creates a lock inversion between
ctx->ctx_lock (kioctx::ctx_lock) and req->head->lock (wait_queue_head::lock).

Task 1:
signalfd_cleanup()
-> wake_up_poll() [takes wait_queue_head::lock]
-> aio_poll_wake() [takes kioctx::ctx_lock]

Task 2:
sys_io_cancel() [takes kioctx::ctx_lock]
-> aio_poll_cancel [takes wait_queue_head::lock]

Previously this was okay because the lock operation in aio_poll_wake() was only
a trylock. This patch changes it to a regular lock, which causes a deadlock.

I am able to reproduce this deadlock. It also generates a lockdep report, shown
below. Unfortunately, I don't know how to fix it. Anyone have any ideas?
Al and Christoph, it looks like you wrote most of the aio poll code?

Note, the use-after-free this patch is fixing also affects signalfd, not just
binder, since both rely on POLLFREE. (I was testing it with signalfd.) So we
really need to fix it one way or another...

======================================================
WARNING: possible circular locking dependency detected
5.16.0-rc2-00001-gf97efc5c03bf #22 Not tainted
------------------------------------------------------
aio/137 is trying to acquire lock:
ffff888006170158 (&ctx->ctx_lock){..-.}-{2:2}, at: aio_poll_wake+0x1ac/0x390 fs/aio.c:1693

but task is already holding lock:
ffff8880053a91e0 (&sighand->signalfd_wqh){....}-{2:2}, at: __wake_up_common_lock+0x5b/0xb0 kernel/sched/wait.c:137

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&sighand->signalfd_wqh){....}-{2:2}:
__lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5637 [inline]
lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:349 [inline]
aio_poll.constprop.0+0x15d/0x440 fs/aio.c:1773
__io_submit_one.constprop.0+0x139/0x1b0 fs/aio.c:1847
io_submit_one+0x134/0x640 fs/aio.c:1884
__do_sys_io_submit fs/aio.c:1943 [inline]
__se_sys_io_submit fs/aio.c:1913 [inline]
__x64_sys_io_submit+0x89/0x260 fs/aio.c:1913
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&ctx->ctx_lock){..-.}-{2:2}:
check_prev_add+0x93/0xbf0 kernel/locking/lockdep.c:3063
check_prevs_add kernel/locking/lockdep.c:3186 [inline]
validate_chain+0x585/0x8c0 kernel/locking/lockdep.c:3801
__lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5637 [inline]
lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3e/0x60 kernel/locking/spinlock.c:162
aio_poll_wake+0x1ac/0x390 fs/aio.c:1693
__wake_up_common+0x8c/0x1a0 kernel/sched/wait.c:108
__wake_up_common_lock+0x77/0xb0 kernel/sched/wait.c:138
__wake_up+0xe/0x10 kernel/sched/wait.c:157
signalfd_cleanup+0x33/0x40 fs/signalfd.c:48
__cleanup_sighand kernel/fork.c:1613 [inline]
__cleanup_sighand+0x27/0x50 kernel/fork.c:1610
__exit_signal+0x236/0x380 kernel/exit.c:159
release_task+0x180/0x3d0 kernel/exit.c:200
wait_task_zombie+0x28a/0x600 kernel/exit.c:1114
wait_consider_task+0x121/0x160 kernel/exit.c:1341
do_wait_thread kernel/exit.c:1404 [inline]
do_wait+0x21b/0x380 kernel/exit.c:1521
kernel_wait4+0xaa/0x150 kernel/exit.c:1684
__do_sys_wait4+0x85/0x90 kernel/exit.c:1712
__se_sys_wait4 kernel/exit.c:1708 [inline]
__x64_sys_wait4+0x17/0x20 kernel/exit.c:1708
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&sighand->signalfd_wqh);
lock(&ctx->ctx_lock);
lock(&sighand->signalfd_wqh);
lock(&ctx->ctx_lock);

*** DEADLOCK ***

2 locks held by aio/137:
#0: ffffffff81e06098 (tasklist_lock){.+.+}-{2:2}, at: release_task+0x110/0x3d0 kernel/exit.c:197
#1: ffff8880053a91e0 (&sighand->signalfd_wqh){....}-{2:2}, at: __wake_up_common_lock+0x5b/0xb0 kernel/sched/wait.c:137

stack backtrace:
CPU: 3 PID: 137 Comm: aio Not tainted 5.16.0-rc2-00001-gf97efc5c03bf #22
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014
Call Trace:
<TASK>
show_stack+0x3d/0x3f arch/x86/kernel/dumpstack.c:318
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x49/0x5e lib/dump_stack.c:106
dump_stack+0x10/0x12 lib/dump_stack.c:113
print_circular_bug.cold+0x13e/0x143 kernel/locking/lockdep.c:2021
check_noncircular+0xfe/0x110 kernel/locking/lockdep.c:2143
check_prev_add+0x93/0xbf0 kernel/locking/lockdep.c:3063
check_prevs_add kernel/locking/lockdep.c:3186 [inline]
validate_chain+0x585/0x8c0 kernel/locking/lockdep.c:3801
__lock_acquire+0x4b4/0x960 kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5637 [inline]
lock_acquire+0xc9/0x2e0 kernel/locking/lockdep.c:5602
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3e/0x60 kernel/locking/spinlock.c:162
aio_poll_wake+0x1ac/0x390 fs/aio.c:1693
__wake_up_common+0x8c/0x1a0 kernel/sched/wait.c:108
__wake_up_common_lock+0x77/0xb0 kernel/sched/wait.c:138
__wake_up+0xe/0x10 kernel/sched/wait.c:157
signalfd_cleanup+0x33/0x40 fs/signalfd.c:48
__cleanup_sighand kernel/fork.c:1613 [inline]
__cleanup_sighand+0x27/0x50 kernel/fork.c:1610
__exit_signal+0x236/0x380 kernel/exit.c:159
release_task+0x180/0x3d0 kernel/exit.c:200
wait_task_zombie+0x28a/0x600 kernel/exit.c:1114
wait_consider_task+0x121/0x160 kernel/exit.c:1341
do_wait_thread kernel/exit.c:1404 [inline]
do_wait+0x21b/0x380 kernel/exit.c:1521
kernel_wait4+0xaa/0x150 kernel/exit.c:1684
__do_sys_wait4+0x85/0x90 kernel/exit.c:1712
__se_sys_wait4 kernel/exit.c:1708 [inline]
__x64_sys_wait4+0x17/0x20 kernel/exit.c:1708
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f23a3b0e9ea
Code: ff e9 0a 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 3d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 89 54 24 14
RSP: 002b:00007ffcd0926098 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f23a3b0e9ea
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffff
RBP: 000055944067b000 R08: fffffffe7fffffff R09: fffffffe7fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 00005594406761c0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>