Re: [syzbot] [kernel?] KCSAN: data-race in kobject_put / kobject_uevent_env (2)

From: Marco Elver
Date: Thu Jul 18 2024 - 06:38:57 EST


On Thu, 18 Jul 2024 at 10:55, syzbot
<syzbot+0ac8e4da6d5cfcc7743e@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: d67978318827 Merge tag 'x86_cpu_for_v6.11_rc1' of git://gi..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11af145e980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=ca8f4e92a17047ec
> dashboard link: https://syzkaller.appspot.com/bug?extid=0ac8e4da6d5cfcc7743e
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/1f219d0a9994/disk-d6797831.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/91aed26b830f/vmlinux-d6797831.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/c7e753e56950/bzImage-d6797831.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+0ac8e4da6d5cfcc7743e@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> llcp: llcp_sock_recvmsg: Recv datagram failed state 3 -6 0
> ==================================================================
> BUG: KCSAN: data-race in kobject_put / kobject_uevent_env
>
> read-write to 0xffff888139ddec54 of 1 bytes by task 22621 on cpu 0:
> kobject_uevent_env+0x4e/0x550 lib/kobject_uevent.c:476
> kobject_uevent+0x1c/0x30 lib/kobject_uevent.c:641
> device_del+0x6fa/0x780 drivers/base/core.c:3886
> nfc_unregister_device+0x114/0x130 net/nfc/core.c:1183
> nci_unregister_device+0x14c/0x160 net/nfc/nci/core.c:1312
> virtual_ncidev_close+0x30/0x50 drivers/nfc/virtual_ncidev.c:172
> __fput+0x192/0x6f0 fs/file_table.c:422
> ____fput+0x15/0x20 fs/file_table.c:450
> task_work_run+0x13a/0x1a0 kernel/task_work.c:180
> resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffff888139ddec54 of 1 bytes by task 22615 on cpu 1:
> kobject_put+0x25/0x180 lib/kobject.c:733
> put_device+0x1f/0x30 drivers/base/core.c:3787
> nfc_put_device net/nfc/nfc.h:103 [inline]
> nfc_llcp_local_put+0x87/0xb0 net/nfc/llcp_core.c:196
> nfc_llcp_sock_free net/nfc/llcp_sock.c:1021 [inline]
> llcp_sock_destruct+0x14d/0x1a0 net/nfc/llcp_sock.c:966
> __sk_destruct+0x3d/0x440 net/core/sock.c:2175
> sk_destruct net/core/sock.c:2223 [inline]
> __sk_free+0x284/0x2d0 net/core/sock.c:2234
> sk_free+0x39/0x70 net/core/sock.c:2245
> sock_put include/net/sock.h:1879 [inline]
> llcp_sock_release+0x38f/0x3d0 net/nfc/llcp_sock.c:646
> __sock_release net/socket.c:659 [inline]
> sock_close+0x68/0x150 net/socket.c:1421
> __fput+0x192/0x6f0 fs/file_table.c:422
> ____fput+0x15/0x20 fs/file_table.c:450
> task_work_run+0x13a/0x1a0 kernel/task_work.c:180
> resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
> syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0x07 -> 0x0d
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 22615 Comm: syz.0.6322 Tainted: G W 6.10.0-syzkaller-01155-gd67978318827 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> ==================================================================

The two racing accesses here (on bitfield variables
kobj->state_remove_uevent_sent, kobj->state_initialized) are in the
same bitfield. There's no guarantee (by the compiler) that while the
bitfield is being updated the bit at kobj->state_initialized will
remain non-zero, and therefore the WARN in kobject_put() could be
triggered. This appears benign, unless of course someone set
panic_on_warn.

More generally, if the bitfield is updated concurrently, it's very
likely that one of the updates would be lost.

Just my initial observation.