BUG: KASAN: slab-use-after-free in reweight_entity+0x8d4

From: Mikhail Gavrilov
Date: Sat Oct 21 2023 - 08:26:33 EST


Hi,
I caught one slab-use-after-free durung release cycle 6.6 rc6 and it
looks like it will not easily reproduce again.
I tried rebooting and repeating all my actions but slab-use-after-free
did not happen again.

What we know:

1) backtrace:

[ 336.215839] BUG: KASAN: slab-use-after-free in reweight_entity+0x8d4/0xa50
[ 336.215846] Read of size 8 at addr ffff88812d1a80b0 by task
fossilize_repla/10552

[ 336.215850] CPU: 23 PID: 10552 Comm: fossilize_repla Tainted: G
W OEL ------- ---
6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug #1
[ 336.215853] Hardware name: Micro-Star International Co., Ltd.
MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.61 10/18/2023
[ 336.215854] Call Trace:
[ 336.215856] <TASK>
[ 336.215858] dump_stack_lvl+0x76/0xd0
[ 336.215862] print_report+0xcf/0x670
[ 336.215866] ? reweight_entity+0x8d4/0xa50
[ 336.215868] kasan_report+0xa6/0xe0
[ 336.215871] ? reweight_entity+0x8d4/0xa50
[ 336.215874] reweight_entity+0x8d4/0xa50
[ 336.215877] task_tick_fair+0x101/0xcf0
[ 336.215880] ? __pfx_task_tick_fair+0x10/0x10
[ 336.215882] ? update_rq_clock+0x2c9/0xc70
[ 336.215885] scheduler_tick+0x27c/0xaf0
[ 336.215888] update_process_times+0x157/0x1d0
[ 336.215891] ? __pfx_update_process_times+0x10/0x10
[ 336.215894] tick_sched_handle+0x67/0x130
[ 336.215897] tick_sched_timer+0xb1/0xe0
[ 336.215900] ? __pfx_tick_sched_timer+0x10/0x10
[ 336.215901] __hrtimer_run_queues+0x4d5/0x910
[ 336.215905] ? __pfx___hrtimer_run_queues+0x10/0x10
[ 336.215907] ? ktime_get_update_offsets_now+0x90/0x250
[ 336.215910] hrtimer_interrupt+0x2f5/0x810
[ 336.215913] __sysvec_apic_timer_interrupt+0xfc/0x3b0
[ 336.215916] sysvec_apic_timer_interrupt+0x42/0xc0
[ 336.215919] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 336.215922] RIP: 0033:0x7fb90dad8915
[ 336.215939] Code: 00 00 48 8d 73 78 ba 01 00 00 00 48 89 df 4c 89
55 c8 e8 2e 5e d9 ff 48 89 de 4c 89 ef e8 93 94 d9 ff 4c 8b 7d a0 4c
8b 55 c8 <49> 8b 47 10 48 85 c0 0f 84 26 fa ff ff 48 8b 78 08 48 85 ff
0f 84
[ 336.215941] RSP: 002b:00007fb90cde2480 EFLAGS: 00000246
[ 336.215943] RAX: 00007fb8ec077c50 RBX: 00007fb900906708 RCX: 00007fb9008fd3a8
[ 336.215945] RDX: 00007fb8eda0c6f0 RSI: 0000000000000c91 RDI: 0000000000000c92
[ 336.215946] RBP: 00007fb90cde24f0 R08: 0000000000000000 R09: 0000000000000000
[ 336.215947] R10: 00007fb90cde2520 R11: 00007fb90cde25c4 R12: 00007fb90003def0
[ 336.215949] R13: 00007fb90368ea50 R14: 00007fb8ed5fed20 R15: 00007fb9037b5620
[ 336.215952] </TASK>

[ 336.215954] Allocated by task 2:
[ 336.215955] kasan_save_stack+0x33/0x60
[ 336.215958] kasan_set_track+0x25/0x30
[ 336.215959] __kasan_slab_alloc+0x6e/0x70
[ 336.215961] kmem_cache_alloc_node+0x18d/0x420
[ 336.215962] copy_process+0x3be/0x6910
[ 336.215964] kernel_clone+0xc8/0x710
[ 336.215965] kernel_thread+0xb4/0xf0
[ 336.215967] kthreadd+0x9c7/0xe00
[ 336.215969] ret_from_fork+0x31/0x70
[ 336.215971] ret_from_fork_asm+0x1b/0x30

[ 336.215974] Freed by task 150:
[ 336.215975] kasan_save_stack+0x33/0x60
[ 336.215977] kasan_set_track+0x25/0x30
[ 336.215978] kasan_save_free_info+0x2b/0x50
[ 336.215980] __kasan_slab_free+0x10b/0x1a0
[ 336.215982] slab_free_freelist_hook+0x12b/0x1e0
[ 336.215984] kmem_cache_free+0x174/0x480
[ 336.215986] delayed_put_task_struct+0x162/0x1c0
[ 336.215988] rcu_do_batch+0x445/0x1700
[ 336.215990] rcu_core+0x880/0xdb0
[ 336.215992] __do_softirq+0x218/0x8bb

[ 336.215995] Last potentially related work creation:
[ 336.215996] kasan_save_stack+0x33/0x60
[ 336.215997] __kasan_record_aux_stack+0x94/0xa0
[ 336.215999] __call_rcu_common.constprop.0+0xf8/0x1af0
[ 336.216001] __schedule+0x10b4/0x5e90
[ 336.216003] schedule_idle+0x60/0x90
[ 336.216005] do_idle+0x294/0x450
[ 336.216007] cpu_startup_entry+0x55/0x60
[ 336.216008] start_secondary+0x215/0x290
[ 336.216010] secondary_startup_64_no_verify+0x17d/0x18b

[ 336.216013] The buggy address belongs to the object at ffff88812d1a8000
which belongs to the cache task_struct of size 15640
[ 336.216015] The buggy address is located 176 bytes inside of
freed 15640-byte region [ffff88812d1a8000, ffff88812d1abd18)

[ 336.216017] The buggy address belongs to the physical page:
[ 336.216018] page:00000000b186a037 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x12d1a8
[ 336.216021] head:00000000b186a037 order:3 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 336.216022] memcg:ffff88814b078841
[ 336.216023] flags:
0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 336.216026] page_type: 0xffffffff()
[ 336.216028] raw: 0017ffffc0000840 ffff888100052080 dead000000000122
0000000000000000
[ 336.216030] raw: 0000000000000000 0000000000020002 00000001ffffffff
ffff88814b078841
[ 336.216031] page dumped because: kasan: bad access detected

[ 336.216033] Memory state around the buggy address:
[ 336.216034] ffff88812d1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff
[ 336.216035] ffff88812d1a8000: fa fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 336.216036] >ffff88812d1a8080: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 336.216037] ^
[ 336.216038] ffff88812d1a8100: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 336.216040] ffff88812d1a8180: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 336.216041] ==================================================================

2) Line number in code:

❯ /usr/src/kernels/6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug/scripts/faddr2line
/lib/debug/lib/modules/6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug/vmlinux
reweight_entity+0x8d4
reweight_entity+0x8d4/0xa50:
__update_min_deadline at kernel/sched/fair.c:805
(inlined by) min_deadline_update at kernel/sched/fair.c:819
(inlined by) min_deadline_cb_propagate at kernel/sched/fair.c:825
(inlined by) reweight_entity at kernel/sched/fair.c:3660

❯ cat -n /usr/src/debug/kernel-6.6-rc6-182-gce55c22ec8b2/linux-6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64/kernel/sched/fair.c
| sed -n '3650,3670 p'
3650 */
3651 se->vlag = div_s64(se->vlag * old_weight, weight);
3652 } else {
3653 s64 deadline = se->deadline - se->vruntime;
3654 /*
3655 * When the weight changes, the virtual time slope changes and
3656 * we should adjust the relative virtual deadline accordingly.
3657 */
3658 deadline = div_s64(deadline * old_weight, weight);
3659 se->deadline = se->vruntime + deadline;
3660 min_deadline_cb_propagate(&se->run_node, NULL);
3661 }
3662
3663 #ifdef CONFIG_SMP
3664 do {
3665 u32 divider = get_pelt_divider(&se->avg);
3666
3667 se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider);
3668 } while (0);
3669 #endif
3670

❯ git blame kernel/sched/fair.c -L 3650,3670
86bfbb7ce4f67 (Peter Zijlstra 2023-05-31 13:58:42 +0200 3650) */
86bfbb7ce4f67 (Peter Zijlstra 2023-05-31 13:58:42 +0200 3651)
se->vlag = div_s64(se->vlag * old_weight, weight);
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3652) } else {
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3653)
s64 deadline = se->deadline - se->vruntime;
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3654) /*
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3655)
* When the weight changes, the virtual time slope changes and
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3656)
* we should adjust the relative virtual deadline accordingly.
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3657) */
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3658)
deadline = div_s64(deadline * old_weight, weight);
147f3efaa2418 (Peter Zijlstra 2023-05-31 13:58:44 +0200 3659)
se->deadline = se->vruntime + deadline;
8dafa9d0eb1a1 (Peter Zijlstra 2023-10-06 21:24:45 +0200 3660)
min_deadline_cb_propagate(&se->run_node, NULL);
86bfbb7ce4f67 (Peter Zijlstra 2023-05-31 13:58:42 +0200 3661) }
86bfbb7ce4f67 (Peter Zijlstra 2023-05-31 13:58:42 +0200 3662)
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3663) #ifdef CONFIG_SMP
1ea6c46a23f12 (Peter Zijlstra 2017-05-06 15:59:54 +0200 3664) do {
87e867b4269f2 (Vincent Guittot 2020-06-12 17:47:03 +0200 3665)
u32 divider = get_pelt_divider(&se->avg);
1ea6c46a23f12 (Peter Zijlstra 2017-05-06 15:59:54 +0200 3666)
1ea6c46a23f12 (Peter Zijlstra 2017-05-06 15:59:54 +0200 3667)
se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum,
divider);
1ea6c46a23f12 (Peter Zijlstra 2017-05-06 15:59:54 +0200 3668) } while (0);
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3669) #endif
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3670)

3) Recent commit which change this line is:

❯ git show 8dafa9d0eb1a1
commit 8dafa9d0eb1a1550a0f4d462db9354161bc51e0c
Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Date: Fri Oct 6 21:24:45 2023 +0200

sched/eevdf: Fix min_deadline heap integrity

Marek and Biju reported instances of:

"EEVDF scheduling fail, picking leftmost"

which Mike correlated with cgroup scheduling and the min_deadline heap
getting corrupted; some trace output confirms:

> And yeah, min_deadline is hosed somehow:
>
> validate_cfs_rq: --- /
> __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag:
870381 vd: -55861854 vmd: -66302085 E (11372/tr)
> __print_se: ffff88810d165800 w: 25 ve: -80323686 lag:
22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
> __print_se: ffff888108379000 w: 25 ve: 0 lag: -57987257 vd:
114632828 vmd: 114632828 N (-1//autogroup-33)
> validate_cfs_rq: min_deadline: -55861854 avg_vruntime:
-62278313462 / 1074 = -57987256

Turns out that reweight_entity(), which tries really hard to be fast,
does not do the normal dequeue+update+enqueue pattern but *does* scale
the deadline.

However, it then fails to propagate the updated deadline value up the
heap.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like
scheduling policy")
Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Reported-by: Biju Das <biju.das.jz@xxxxxxxxxxxxxx>
Reported-by: Mike Galbraith <efault@xxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Tested-by: Biju Das <biju.das.jz@xxxxxxxxxxxxxx>
Tested-by: Mike Galbraith <efault@xxxxxx>
Link: https://lkml.kernel.org/r/20231006192445.GE743@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ef7490c4b8b4..a4b904a010c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3613,6 +3613,7 @@ static void reweight_entity(struct cfs_rq
*cfs_rq, struct sched_entity *se,
*/
deadline = div_s64(deadline * old_weight, weight);
se->deadline = se->vruntime + deadline;
+ min_deadline_cb_propagate(&se->run_node, NULL);
}

#ifdef CONFIG_SMP

I have no idea what else I could do to help fix this issue.

I also attach the full kernel log below.
My hardware: https://linux-hardware.org/?probe=dd5735f315
Thank you in advance.

--
Best Regards,
Mike Gavrilov.

Attachment: dmesg-slab-use-after-free in reweight_entity.zip
Description: Zip archive