RE: [RFC PATCH 8/8] sched/debug: Print task preferred LLC for scheduler debugging

From: Jianyong Wu

Date: Mon Jun 29 2026 - 22:03:49 EST


Hi Xiao,

Thanks for catching this bug and for suggesting the fix. I'll incorporate it in v2.

Thanks
Jianyong

> -----Original Message-----
> From: XIAO WU <xiaowu.417@xxxxxx>
> Sent: Monday, June 29, 2026 3:29 AM
> To: Jianyong Wu <wujianyong@xxxxxxxx>; mingo@xxxxxxxxxx;
> peterz@xxxxxxxxxxxxx; juri.lelli@xxxxxxxxxx; vincent.guittot@xxxxxxxxxx;
> dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx;
> mgorman@xxxxxxx; vschneid@xxxxxxxxxx; kprateek.nayak@xxxxxxx;
> sshegde@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> yu.c.chen@xxxxxxxxx; tim.c.chen@xxxxxxxxxxxxxxx
> Cc: justin.he@xxxxxxx; Yuan Zhong <zhongyuan@xxxxxxxx>; Zhiwei Ying
> <yingzhiwei@xxxxxxxx>; Huangsj <huangsj@xxxxxxxx>
> Subject: Re: [RFC PATCH 8/8] sched/debug: Print task preferred LLC for
> scheduler debugging
>
> Hi Jianyong,
>
> I came across the Sashiko AI review of this series and reproduced the
> use-after-free it flagged in sched_show_cache() — a KASAN
> slab-use-after-free triggers when reading /proc/<pid>/sched while the
> target task is concurrently exiting.
>
> The Sashiko review is at:
> https://sashiko.dev/#/patchset/20260625030759.25928-1-wujianyong@hyg
> on.cn
>
> > +static void sched_show_cache(struct task_struct *p, struct seq_file *m)
> > +{
> > +#ifdef CONFIG_SCHED_CACHE
> > +    struct mm_struct *mm = p->mm;
> > +    int sc_cpu, sc_llc, sc_node, pref_llc, pref_node;
> > +
> > +    if (!mm)
> > +        return;
> > +
> > +    sc_cpu = READ_ONCE(mm->sc_stat.cpu);
>
> This saves p->mm into a local variable and checks it for NULL, but
> does so without holding task_lock(p) or taking a reference via
> get_task_mm().  If the target task is concurrently exiting,
> exit_mm() can drop the final reference and free the mm_struct
> between the NULL check and the READ_ONCE(mm->sc_stat.cpu) access,
> resulting in a slab-use-after-free.
>
> The access happens from proc_sched_show_task() which is reachable
> via /proc/<pid>/sched — userspace can trigger this for any visible
> task by simply reading the proc file while the task exits.
>
> === Reproduction ===
>
> Kernel: 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
> Arch:   x86_64 (QEMU Standard PC Q35 + ICH9, 2009)
> Config: CONFIG_KASAN=y, CONFIG_SCHED_CACHE=y
>
> Trigger: race fork/exit against /proc/<pid>/sched reads.  16 worker
> threads each fork children and read /proc/<child_pid>/sched while
> the child immediately exits.
>
> === Crash Log ===
>
> [  991.032119][T535366] BUG: KASAN: slab-use-after-free in
> proc_sched_show_task+0x30c7/0x3470
> [  991.032971][T535366] Read of size 4 at addr ffff88802fe0d960 by task
> poc/535366
> [  991.033952][T535366] CPU: 1 UID: 0 PID: 535366 Comm: poc Not tainted
> 7.1.0-rc2-gd93b88951718 #1 PREEMPT(full)
> [  991.033962][T535366] Hardware name: QEMU Standard PC (Q35 + ICH9,
> 2009)
> [  991.033967][T535366] Call Trace:
> [  991.033970][T535366]  <TASK>
> [  991.033973][T535366]  dump_stack_lvl+0x116/0x1f0
> [  991.033989][T535366]  print_report+0xf4/0x600
> [  991.034021][T535366]  kasan_report+0xe0/0x110
> [  991.034032][T535366]  ? proc_sched_show_task+0x30c7/0x3470
> [  991.034043][T535366]  proc_sched_show_task+0x30c7/0x3470
> [  991.034054][T535366]  sched_show+0xf4/0x1b0
> [  991.034062][T535366]  seq_read_iter+0x513/0x12d0
> [  991.034074][T535366]  seq_read+0x3b1/0x590
> [  991.034093][T535366]  vfs_read+0x1e9/0xd00
> [  991.034153][T535366]  ksys_read+0x12f/0x250
> [  991.034220][T535366]  do_syscall_64+0x129/0x880
> [  991.034240][T535366]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [  991.034250][T535366]  </TASK>
>
> The Read of size 4 at an offset into a freed mm_struct matches the
> READ_ONCE(mm->sc_stat.cpu) access racing against exit_mm().
>
> === PoC ===
>
> Build:  gcc -o poc poc.c -static
> Run:    ./poc
>
> /* sched_show_cache UAF PoC — race fork/exit against /proc/<pid>/sched
> */
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <signal.h>
> #include <sys/wait.h>
>
> static void worker(void)
> {
>     char buf[64], path[64];
>     for (int i = 0; i < 50000000; i++) {
>         pid_t pid = fork();
>         if (pid == 0) { _exit(0); }
>         if (pid > 0) {
>             snprintf(path, 64, "/proc/%d/sched", pid);
>             int fd = open(path, O_RDONLY);
>             if (fd >= 0) { read(fd, buf, 63); close(fd); }
>         }
>     }
>     _exit(0);
> }
>
> int main(void)
> {
>     signal(SIGCHLD, SIG_IGN);
>     for (int i = 0; i < 16; i++)
>         if (fork() == 0) { worker(); }
>     while (wait(NULL) > 0);
>     return 0;
> }
>
> The fix is to use get_task_mm(p) / mmput(mm) around the mm access,
> which safely pins the mm_struct for the duration of the read.
>
> Thanks,
> Xiao
>