Re: WARNING in percpu_ref_kill_and_confirm (2)

From: Jens Axboe
Date: Wed Dec 16 2020 - 16:47:40 EST


On 12/16/20 2:14 PM, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 7b1b868e Merge tag 'for-linus' of git://git.kernel.org/pub..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1156046b500000
> kernel config: https://syzkaller.appspot.com/x/.config?x=3416bb960d5c705d
> dashboard link: https://syzkaller.appspot.com/bug?extid=c9937dfb2303a5f18640
> compiler: gcc (GCC) 10.1.0-syz 20200507
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1407c287500000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed5f07500000
>
> The issue was bisected to:
>
> commit 4d004099a668c41522242aa146a38cc4eb59cb1e
> Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Fri Oct 2 09:04:21 2020 +0000
>
> lockdep: Fix lockdep recursion

Ehhh no... This is timing dependent, so probably why it ends up pinpointing
something totally unrelated.

> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14e9d433500000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=16e9d433500000
> console output: https://syzkaller.appspot.com/x/log.txt?x=12e9d433500000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+c9937dfb2303a5f18640@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion")
>
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441309
> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000
> ------------[ cut here ]------------
> percpu_ref_kill_and_confirm called more than once on io_ring_ctx_ref_free!
> WARNING: CPU: 0 PID: 8476 at lib/percpu-refcount.c:382 percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
> Modules linked in:
> CPU: 0 PID: 8476 Comm: syz-executor389 Not tainted 5.10.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:percpu_ref_kill_and_confirm+0x126/0x180 lib/percpu-refcount.c:382
> Code: 5d 08 48 8d 7b 08 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5d 48 8b 53 08 48 c7 c6 00 4b 9d 89 48 c7 c7 60 4a 9d 89 e8 c6 97 f6 04 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 48 89 ea 48 c1 ea 03 80 3c 02
> RSP: 0018:ffffc9000b94fe10 EFLAGS: 00010086
> RAX: 0000000000000000 RBX: ffff888011da4580 RCX: 0000000000000000
> RDX: ffff88801fe84ec0 RSI: ffffffff8158c835 RDI: fffff52001729fb4
> RBP: ffff88801539f000 R08: 0000000000000001 R09: ffff8880b9e2011b
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000293
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88802de28758
> FS: 00000000014ab880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2a7046b000 CR3: 0000000023368000 CR4: 0000000000350ef0
> Call Trace:
> percpu_ref_kill include/linux/percpu-refcount.h:149 [inline]
> io_ring_ctx_wait_and_kill+0x2b/0x450 fs/io_uring.c:8382
> io_uring_release+0x3e/0x50 fs/io_uring.c:8420
> __fput+0x285/0x920 fs/file_table.c:281
> task_work_run+0xdd/0x190 kernel/task_work.c:151
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
> exit_to_user_mode_prepare+0x17e/0x1a0 kernel/entry/common.c:191
> syscall_exit_to_user_mode+0x38/0x260 kernel/entry/common.c:266
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x441309
> Code: e8 5c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ffed6545d38 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: fffffffffffffff4 RBX: 0000000000000000 RCX: 0000000000441309
> RDX: 0000000000000002 RSI: 00000000200000c0 RDI: 0000000000003ad1
> RBP: 000000000000f2ae R08: 0000000000000002 R09: 00000000004002c8
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004021d0
> R13: 0000000000402260 R14: 0000000000000000 R15: 0000000000000000

What I think happens here is that we switch mode, but exit before it's
done. Wonder if there's a wait to wait for that on exit. Tejun?

--
Jens Axboe