Re: [syzbot] [rdma?] KASAN: slab-use-after-free Read in dev_get_flags

From: Zhu Yanjun
Date: Mon Dec 09 2024 - 08:27:47 EST


On 05.12.24 21:11, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit: 9197b73fd7bb Merge tag '9p-for-6.12-rc4' of https://github..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000
kernel config: https://syzkaller.appspot.com/x/.config?x=16e543edc81a3008
dashboard link: https://syzkaller.appspot.com/bug?extid=4b87489410b4efd181bf
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/6c07256fc575/disk-9197b73f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9f64355d92f7/vmlinux-9197b73f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c402fba0b938/bzImage-9197b73f.xz


From the following Call Trace, I assume that the test is based on net namespace based on this "create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110".
And the netdev is veth based on this "veth_newlink+0x2c5/0xce0 drivers/net/veth.c:1815".
Then rxe is created based on veth based on this "[ 839.350575][ T930] infiniband syz2: set down" in the link https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000.

Because rxe can not work well with net namespace without the patchset "https://patchwork.kernel.org/project/linux-rdma/cover/20230624073927.707915-1-yanjun.zhu@xxxxxxxxx/";, it should be the root cause of this problem.

When I delved into this problem, I found the following timeline.
1). In the link https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000,

"
[ 839.350575][ T930] infiniband syz2: set down
"

This means that on 839.350575, the event ib_cache_event_task was sent and queued in ib_wq.

2). In the link https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000,

"
[ 843.251853][ T930] team0 (unregistering): Port device team_slave_0 removed
"

It indicates that before 843.251853, T930 should free veth.

3). In the link https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000,

"
[ 850.559070][ T5295] BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0
"
This means that on 850.559070, this slab-use-after-free occured.

In all, on 839.350575, the event ib_cache_event_task was sent and queued in ib_wq, on 850.559070, this event was executed.

And before 843.251853, the netdev veth was freed by T930.

The above should be the direct reason of this problem.

It is interesting that an event was queued in ib_wq for so long time.
Need more information to find out why.

If I am missing something, please feel free to let me know.

Zhu Yanjun

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4b87489410b4efd181bf@xxxxxxxxxxxxxxxxxxxxxxxxx

==================================================================
BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
Read of size 4 at addr ffff8880554640b0 by task kworker/1:4/5295

CPU: 1 UID: 0 PID: 5295 Comm: kworker/1:4 Not tainted 6.12.0-rc3-syzkaller-00399-g9197b73fd7bb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: infiniband ib_cache_event_task
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
rxe_query_port+0x12d/0x260 drivers/infiniband/sw/rxe/rxe_verbs.c:60
__ib_query_port drivers/infiniband/core/device.c:2111 [inline]
ib_query_port+0x168/0x7d0 drivers/infiniband/core/device.c:2143
ib_cache_update+0x1a9/0xb80 drivers/infiniband/core/cache.c:1494
ib_cache_event_task+0xf3/0x1e0 drivers/infiniband/core/cache.c:1568
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>

Allocated by task 10264:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:257 [inline]
__do_kmalloc_node mm/slub.c:4264 [inline]
__kmalloc_node_noprof+0x22a/0x440 mm/slub.c:4270
__kvmalloc_node_noprof+0x72/0x190 mm/util.c:658
alloc_netdev_mqs+0x9b/0x1000 net/core/dev.c:11097
rtnl_create_link+0x2f9/0xc20 net/core/rtnetlink.c:3403
veth_newlink+0x2c5/0xce0 drivers/net/veth.c:1815
rtnl_newlink_create net/core/rtnetlink.c:3539 [inline]
__rtnl_newlink net/core/rtnetlink.c:3759 [inline]
rtnl_newlink+0x1593/0x20a0 net/core/rtnetlink.c:3772
rtnetlink_rcv_msg+0x741/0xcf0 net/core/rtnetlink.c:6675
netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2551
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f8/0x990 net/netlink/af_netlink.c:1357
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1901
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x223/0x270 net/socket.c:744
__sys_sendto+0x39b/0x4f0 net/socket.c:2214
__do_sys_sendto net/socket.c:2226 [inline]
__se_sys_sendto net/socket.c:2222 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 930:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:2342 [inline]
slab_free mm/slub.c:4579 [inline]
kfree+0x1a0/0x440 mm/slub.c:4727
device_release+0x9b/0x1c0
kobject_cleanup lib/kobject.c:689 [inline]
kobject_release lib/kobject.c:720 [inline]
kref_put include/linux/kref.h:65 [inline]
kobject_put+0x231/0x480 lib/kobject.c:737
netdev_run_todo+0xe79/0x1000 net/core/dev.c:10816
default_device_exit_batch+0xa24/0xaa0 net/core/dev.c:11949
ops_exit_list net/core/net_namespace.c:178 [inline]
cleanup_net+0x89f/0xcc0 net/core/net_namespace.c:626
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

The buggy address belongs to the object at ffff888055464000
which belongs to the cache kmalloc-cg-4k of size 4096
The buggy address is located 176 bytes inside of
freed 4096-byte region [ffff888055464000, ffff888055465000)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888055462000 pfn:0x55460
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff88805f4cbd01
flags: 0xfff00000000240(workingset|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000240 ffff88801ac4f500 ffffea0000e93610 ffffea0001b0de10
raw: ffff888055462000 0000000000040001 00000001f5000000 ffff88805f4cbd01
head: 00fff00000000240 ffff88801ac4f500 ffffea0000e93610 ffffea0001b0de10
head: ffff888055462000 0000000000040001 00000001f5000000 ffff88805f4cbd01
head: 00fff00000000003 ffffea0001551801 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd60c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 6569, tgid 6569 (syz-executor), ts 312665355465, free_ts 312602647734
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
prep_new_page mm/page_alloc.c:1545 [inline]
get_page_from_freelist+0x3045/0x3190 mm/page_alloc.c:3457
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
alloc_slab_page+0x6a/0x120 mm/slub.c:2412
allocate_slab+0x5a/0x2f0 mm/slub.c:2578
new_slab mm/slub.c:2631 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3818
__slab_alloc+0x58/0xa0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
__do_kmalloc_node mm/slub.c:4263 [inline]
__kmalloc_node_noprof+0x286/0x440 mm/slub.c:4270
__kvmalloc_node_noprof+0x72/0x190 mm/util.c:658
alloc_netdev_mqs+0x9b/0x1000 net/core/dev.c:11097
vti6_init_net+0x104/0x2f0 net/ipv6/ip6_vti.c:1143
ops_init+0x320/0x590 net/core/net_namespace.c:139
setup_net+0x287/0x9e0 net/core/net_namespace.c:356
copy_net_ns+0x33f/0x570 net/core/net_namespace.c:494
create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
page last free pid 6543 tgid 6543 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1108 [inline]
free_unref_page+0xcfb/0xf20 mm/page_alloc.c:2638
discard_slab mm/slub.c:2677 [inline]
__put_partials+0xeb/0x130 mm/slub.c:3145
put_cpu_partial+0x17c/0x250 mm/slub.c:3220
__slab_free+0x2ea/0x3d0 mm/slub.c:4449
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:247 [inline]
slab_post_alloc_hook mm/slub.c:4085 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
__kmalloc_cache_noprof+0x132/0x2c0 mm/slub.c:4290
kmalloc_noprof include/linux/slab.h:878 [inline]
kzalloc_noprof include/linux/slab.h:1014 [inline]
ref_tracker_alloc+0x14b/0x490 lib/ref_tracker.c:203
__netdev_tracker_alloc include/linux/netdevice.h:4050 [inline]
netdev_hold include/linux/netdevice.h:4079 [inline]
netdev_queue_add_kobject net/core/net-sysfs.c:1786 [inline]
netdev_queue_update_kobjects+0x181/0x550 net/core/net-sysfs.c:1841
register_queue_kobjects net/core/net-sysfs.c:1903 [inline]
netdev_register_kobject+0x265/0x310 net/core/net-sysfs.c:2143
register_netdevice+0x12c5/0x1b00 net/core/dev.c:10491
rtnl_newlink_create net/core/rtnetlink.c:3541 [inline]
__rtnl_newlink net/core/rtnetlink.c:3759 [inline]
rtnl_newlink+0x1728/0x20a0 net/core/rtnetlink.c:3772
rtnetlink_rcv_msg+0x741/0xcf0 net/core/rtnetlink.c:6675
netlink_rcv_skb+0x1e5/0x430 net/netlink/af_netlink.c:2551
netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
netlink_unicast+0x7f8/0x990 net/netlink/af_netlink.c:1357

Memory state around the buggy address:
ffff888055463f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888055464000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888055464080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888055464100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888055464180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup