Re: [PATCH] nfc: llcp: Fix race in handling llcp_devices

From: Paolo Abeni
Date: Thu Dec 01 2022 - 06:26:23 EST


On Tue, 2022-11-29 at 17:44 +0800, Wang ShaoBo wrote:
> There are multiple path operate llcp_devices list without protection:
>
> CPU0 CPU1
>
> nfc_unregister_device() nfc_register_device()
> nfc_llcp_unregister_device() nfc_llcp_register_device() //no lock
> ... list_add(local->list, llcp_devices)
> local_release()
> list_del(local->list)
>
> CPU2
> ...
> nfc_llcp_find_local()
> list_for_each_entry(,&llcp_devices,)
>
> So reach race condition if two of the three occur simultaneously like
> following crash report, although there is no reproduction script in
> syzbot currently, our artificially constructed use cases can also
> reproduce it:
>
> list_del corruption. prev->next should be ffff888060ce7000, but was ffff88802a0ad000. (prev=ffffffff8e536240)
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:59!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 16622 Comm: syz-executor.5 Not tainted 6.1.0-rc6-next-20221125-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
> RIP: 0010:__list_del_entry_valid.cold+0x12/0x72 lib/list_debug.c:59
> Code: f0 ff 0f 0b 48 89 f1 48 c7 c7 60 96 a6 8a 4c 89 e6 e8 4b 29 f0 ff 0f 0b 4c 89 e1 48 89 ee 48 c7 c7 c0 98 a6 8a e8 37 29 f0 ff <0f> 0b 48 89 ee 48 c7 c7 a0 97 a6 8a e8 26 29 f0 ff 0f 0b 4c 89 e2
> RSP: 0018:ffffc900151afd58 EFLAGS: 00010282
> RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000
> RDX: ffff88801e7eba80 RSI: ffffffff8166001c RDI: fffff52002a35f9d
> RBP: ffff888060ce7000 R08: 000000000000006d R09: 0000000000000000
> R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8e536240
> R13: ffff88801f3f3000 R14: ffff888060ce1000 R15: ffff888079d855f0
> FS: 0000555556f57400(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f095d5ad988 CR3: 000000002155a000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __list_del_entry include/linux/list.h:134 [inline]
> list_del include/linux/list.h:148 [inline]
> local_release net/nfc/llcp_core.c:171 [inline]
> kref_put include/linux/kref.h:65 [inline]
> nfc_llcp_local_put net/nfc/llcp_core.c:181 [inline]
> nfc_llcp_local_put net/nfc/llcp_core.c:176 [inline]
> nfc_llcp_unregister_device+0xb8/0x260 net/nfc/llcp_core.c:1619
> nfc_unregister_device+0x196/0x330 net/nfc/core.c:1179
> virtual_ncidev_close+0x52/0xb0 drivers/nfc/virtual_ncidev.c:163
> __fput+0x27c/0xa90 fs/file_table.c:320
> task_work_run+0x16f/0x270 kernel/task_work.c:179
> resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
> exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:203
> __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
> syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
> do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> This patch add specific mutex lock llcp_devices_list_lock to ensure
> handling llcp_devices list safety.

Why a mutex instead of a spinlock? all the critical sections are very
small (both code and time-wise), while the list of callers reaching
that code is quite large making hard to check each of them is really in
process context.

Please switch to a spinlock instead.

Cheers,

Paolo