Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

From: Xu, Yanfei
Date: Mon May 24 2021 - 22:32:38 EST

Next message: Po-Hsu Lin: "[PATCH] selftests: timers: rtcpie: skip test if default RTC device does not exist"
Previous message: Dan Williams: "Re: [RFC v2-fix-v2 1/2] x86/tdx: Handle MWAIT and MONITOR"
In reply to: Paul E. McKenney: "Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace"
Next in thread: Paul E. McKenney: "Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 5/25/21 6:46 AM, Paul E. McKenney wrote:

[Please note: This e-mail is from an EXTERNAL e-mail address]

On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:

On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:

On Fri, May 21, 2021 at 7:29 PM syzbot
<syzbot+7b2b13f4943374609532@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+7b2b13f4943374609532@xxxxxxxxxxxxxxxxxxxxxxxxx

This looks rcu-related. +rcu mailing list

I think I see a possible cause for this, and will say more after some
testing and after becoming more awake Monday morning, Pacific time.

No joy. From what I can see, within RCU Tasks Trace, the calls to
get_task_struct() are properly protected (either by RCU or by an earlier
get_task_struct()), and the calls to put_task_struct() are balanced by
those to get_task_struct().

I could of course have missed something, but at this point I am suspecting
an unbalanced put_task_struct() has been added elsewhere.

As always, extra eyes on this code would be a good thing.

If it were reproducible, I would of course suggest bisection. :-/

Thanx, Paul

Hi Paul,

Could it be?

CPU1 CPU2
trc_add_holdout(t, bhp)
//t->usage==2
release_task
put_task_struct_rcu_user
delayed_put_task_struct
......
put_task_struct(t)
//t->usage==1
check_all_holdout_tasks_trace
->trc_wait_for_one_reader
->trc_del_holdout
->put_task_struct(t)
//t->usage==0 and task_struct freed
READ_ONCE(t->trc_reader_checked)
//ops， t had been freed.

So, after excuting trc_wait_for_one_reader（）, task might had been removed from holdout list and the corresponding task_struct was freed.
And we shouldn't do READ_ONCE(t->trc_reader_checked).

I investigate the trc_wait_for_one_reader（） and found before we excute trc_del_holdout, there is always set t->trc_reader_checked=true. How about we just set the checked flag and unified excute trc_del_holdout()
in check_all_holdout_tasks_trace with checking the flag?

Thanks,
Yanfei

==================================================================
BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12

CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 8477:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:428 [inline]
__kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
kasan_slab_alloc include/linux/kasan.h:236 [inline]
slab_post_alloc_hook mm/slab.h:524 [inline]
slab_alloc_node mm/slub.c:2912 [inline]
kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
__do_sys_clone+0xc8/0x110 kernel/fork.c:2620
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 12:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
kasan_slab_free include/linux/kasan.h:212 [inline]
slab_free_hook mm/slub.c:1581 [inline]
slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
slab_free mm/slub.c:3166 [inline]
kmem_cache_free+0x8a/0x740 mm/slub.c:3182
__put_task_struct+0x26f/0x400 kernel/fork.c:747
trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
release_task+0xca1/0x1690 kernel/exit.c:226
wait_task_zombie kernel/exit.c:1108 [inline]
wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
do_wait_thread kernel/exit.c:1398 [inline]
do_wait+0x724/0xd40 kernel/exit.c:1515
kernel_wait4+0x14c/0x260 kernel/exit.c:1678
__do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
context_switch kernel/sched/core.c:4342 [inline]
__schedule+0x91e/0x23e0 kernel/sched/core.c:5147
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
wake_up_process kernel/sched/core.c:3552 [inline]
wake_up_q+0x96/0x100 kernel/sched/core.c:597
futex_wake+0x3e9/0x490 kernel/futex.c:1634
do_futex+0x326/0x1780 kernel/futex.c:3738
__do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff888027679c40
which belongs to the cache task_struct of size 6976
The buggy address is located 1052 bytes inside of
6976-byte region [ffff888027679c40, ffff88802767b780)
The buggy address belongs to the page:
page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
prep_new_page mm/page_alloc.c:2358 [inline]
get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
alloc_slab_page mm/slub.c:1644 [inline]
allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
new_slab mm/slub.c:1847 [inline]
new_slab_objects mm/slub.c:2593 [inline]
___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
__slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
slab_alloc_node mm/slub.c:2878 [inline]
kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
kernel_thread+0xb5/0xf0 kernel/fork.c:2555
call_usermodehelper_exec_work kernel/umh.c:174 [inline]
call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
page_owner free stack trace missing

Memory state around the buggy address:
ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

^
ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxx.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

--
You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000f034fc05c2da6617%40google.com.

Next message: Po-Hsu Lin: "[PATCH] selftests: timers: rtcpie: skip test if default RTC device does not exist"
Previous message: Dan Williams: "Re: [RFC v2-fix-v2 1/2] x86/tdx: Handle MWAIT and MONITOR"
In reply to: Paul E. McKenney: "Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace"
Next in thread: Paul E. McKenney: "Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]