[PATCH] bpf: Fix use-after-free on mm_struct in bpf_find_vma()
From: Sanghyun Park
Date: Fri May 29 2026 - 03:13:39 EST
bpf_find_vma() reads task->mm without holding task_lock() or taking an
mm reference via mmget()/mmget_not_zero(). When called on a foreign task
obtained via bpf_task_from_pid(), a concurrent exit_mm() can free the
mm_struct between the raw pointer read and mmap_read_trylock(mm),
resulting in a use-after-free on the mm's mmap_lock.
This is the same bug class fixed by commit d8e27d2d22b6 ("bpf: fix mm
lifecycle in open-coded task_vma iterator") for the open-coded task_vma
iterator, but bpf_find_vma() in the same file was missed by that fix.
For the current task, task->mm is stable and needs no extra reference.
For a foreign task, use get_task_mm() which acquires task_lock(), checks
task->mm, and calls mmget() atomically, preventing the race with
exit_mm(). The reference is dropped via mmput() after the mmap lock is
released.
Race:
CPU0 (BPF program) CPU1 (exiting task)
============================ ==========================
bpf_find_vma(foreign_task):
mm = task->mm
// raw read, no reference
exit_mm():
task->mm = NULL
mmput(mm) -> frees mm_struct
mmap_read_trylock(mm)
// UAF: mm is freed
Reproduction:
1. Build kernel >= 5.17 with CONFIG_KASAN=y, CONFIG_BPF_SYSCALL=y
2. Boot in a VM (QEMU works fine)
3. Compile the reproducer below:
gcc -O2 -o repro -static repro.c -lbpf -lelf -lz
4. Run as root: ./repro
5. Check dmesg for: BUG: KASAN: slab-use-after-free in down_read_trylock
The reproducer attaches a BPF program that calls bpf_find_vma() on a
foreign task obtained via bpf_task_from_pid(). A racing thread
repeatedly fork+exit's that task, creating a window where mm is freed.
KASAN report (reproduced on 6.12.91, CONFIG_PREEMPT + KASAN):
BUG: KASAN: slab-use-after-free in down_read_trylock+0x380/0x3f0
Read of size 8 at addr ffff888003cd2fd0 by task repro/164451
Call Trace:
down_read_trylock+0x380/0x3f0
bpf_find_vma+0xdd/0x360
bpf_prog_708df9c9a3e172a7_main_f+0x8b/0x9e
bpf_trampoline_6442513469+0x43/0xa3
Freed by task 164453:
kmem_cache_free+0x15d/0x4b0
finish_task_switch.isra.0+0x4ab/0x810
Fixes: 7c7e3d31e785 ("bpf: Introduce helper bpf_find_vma")
Signed-off-by: Sanghyun Park <sanghyun.park.cnu@xxxxxxxxx>
---
Hi,
I'm Sanghyun Park, a security researcher. I found this while auditing
the BPF task_iter code. The bug has existed since bpf_find_vma() was
introduced in 5.17 and affects all kernels since then, including all
major distros (Ubuntu 22.04+, Fedora 38+, Debian 12+, RHEL 9+).
The C reproducer is attached separately (repro.c).
kernel/bpf/task_iter.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 5af9e130e5..a1b2c3d4e5 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -758,6 +758,7 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
struct vm_area_struct *vma;
bool irq_work_busy = false;
struct mm_struct *mm;
+ bool foreign = task != current;
int ret = -ENOENT;
if (flags)
@@ -766,8 +767,13 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
if (!task)
return -ENOENT;
- mm = task->mm;
- if (!mm)
+ if (foreign) {
+ mm = get_task_mm(task);
+ } else {
+ mm = task->mm;
+ }
+
+ if (!mm)
return -ENOENT;
irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
@@ -783,6 +789,8 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
ret = 0;
}
bpf_mmap_unlock_mm(work, mm);
+ if (foreign)
+ mmput(mm);
return ret;
}
[ 615.565703] BUG: KASAN: slab-use-after-free in down_read_trylock+0x380/0x3f0mm reference via mmget()/mmget_not_zero(). When called on a foreign task
obtained via bpf_task_from_pid(), a concurrent exit_mm() can free the
mm_struct between the raw pointer read and mmap_read_trylock(mm),
resulting in a use-after-free on the mm's mmap_lock.
This is the same bug class fixed by commit d8e27d2d22b6 ("bpf: fix mm
lifecycle in open-coded task_vma iterator") for the open-coded task_vma
iterator, but bpf_find_vma() in the same file was missed by that fix.
For the current task, task->mm is stable and needs no extra reference.
For a foreign task, use get_task_mm() which acquires task_lock(), checks
task->mm, and calls mmget() atomically, preventing the race with
exit_mm(). The reference is dropped via mmput() after the mmap lock is
released.
Race:
CPU0 (BPF program) CPU1 (exiting task)
============================ ==========================
bpf_find_vma(foreign_task):
mm = task->mm
// raw read, no reference
exit_mm():
task->mm = NULL
mmput(mm) -> frees mm_struct
mmap_read_trylock(mm)
// UAF: mm is freed
Reproduction:
1. Build kernel >= 5.17 with CONFIG_KASAN=y, CONFIG_BPF_SYSCALL=y
2. Boot in a VM (QEMU works fine)
3. Compile the reproducer below:
gcc -O2 -o repro -static repro.c -lbpf -lelf -lz
4. Run as root: ./repro
5. Check dmesg for: BUG: KASAN: slab-use-after-free in down_read_trylock
The reproducer attaches a BPF program that calls bpf_find_vma() on a
foreign task obtained via bpf_task_from_pid(). A racing thread
repeatedly fork+exit's that task, creating a window where mm is freed.
KASAN report (reproduced on 6.12.91, CONFIG_PREEMPT + KASAN):
BUG: KASAN: slab-use-after-free in down_read_trylock+0x380/0x3f0
Read of size 8 at addr ffff888003cd2fd0 by task repro/164451
Call Trace:
down_read_trylock+0x380/0x3f0
bpf_find_vma+0xdd/0x360
bpf_prog_708df9c9a3e172a7_main_f+0x8b/0x9e
bpf_trampoline_6442513469+0x43/0xa3
Freed by task 164453:
kmem_cache_free+0x15d/0x4b0
finish_task_switch.isra.0+0x4ab/0x810
Fixes: 7c7e3d31e785 ("bpf: Introduce helper bpf_find_vma")
Signed-off-by: Sanghyun Park <sanghyun.park.cnu@xxxxxxxxx>
---
Hi,
I'm Sanghyun Park, a security researcher. I found this while auditing
the BPF task_iter code. The bug has existed since bpf_find_vma() was
introduced in 5.17 and affects all kernels since then, including all
major distros (Ubuntu 22.04+, Fedora 38+, Debian 12+, RHEL 9+).
The C reproducer is attached separately (repro.c).
kernel/bpf/task_iter.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 5af9e130e5..a1b2c3d4e5 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -758,6 +758,7 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
struct vm_area_struct *vma;
bool irq_work_busy = false;
struct mm_struct *mm;
+ bool foreign = task != current;
int ret = -ENOENT;
if (flags)
@@ -766,8 +767,13 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
if (!task)
return -ENOENT;
- mm = task->mm;
- if (!mm)
+ if (foreign) {
+ mm = get_task_mm(task);
+ } else {
+ mm = task->mm;
+ }
+
+ if (!mm)
return -ENOENT;
irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
@@ -783,6 +789,8 @@ BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
ret = 0;
}
bpf_mmap_unlock_mm(work, mm);
+ if (foreign)
+ mmput(mm);
return ret;
}
[ 615.566467] Read of size 8 at addr ffff888003cd2fd0 by task repro/164451
[ 615.567382] CPU: 0 UID: 0 PID: 164451 Comm: repro Not tainted 6.12.91 #4
[ 615.567392] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 615.567400] Call Trace:
[ 615.567413] <TASK>
[ 615.567423] dump_stack_lvl+0xba/0x110
[ 615.567437] ? down_read_trylock+0x380/0x3f0
[ 615.567442] print_report+0x174/0x4f6
[ 615.567450] ? __virt_addr_valid+0x86/0x670
[ 615.567456] ? down_read_trylock+0x380/0x3f0
[ 615.567462] kasan_report+0xda/0x110
[ 615.567469] ? down_read_trylock+0x380/0x3f0
[ 615.567475] down_read_trylock+0x380/0x3f0
[ 615.567481] ? __pfx_down_read_trylock+0x10/0x10
[ 615.567486] ? bpf_find_vma+0xb1/0x360
[ 615.567494] ? 0xffffffffc0236d08
[ 615.567511] bpf_find_vma+0xdd/0x360
[ 615.567520] bpf_prog_708df9c9a3e172a7_main_f+0x8b/0x9e
[ 615.567524] bpf_trampoline_6442513469+0x43/0xa3
[ 615.567528] __do_sys_getpid+0x9/0x30
[ 615.567533] do_syscall_64+0xbb/0x1f0
[ 615.567540] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 615.567563] RIP: 0033:0x423e1d
[ 615.567568] Code: d5 49 8d 3c 1c eb 9f 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 d0 ff ff ff f7 d8 64 89 01 48
[ 615.567573] RSP: 002b:00007fd2e154b1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000027
[ 615.567593] RAX: ffffffffffffffda RBX: 000000000000137a RCX: 0000000000423e1d
[ 615.567598] RDX: 0000000000423e1d RSI: 0000000000423e1d RDI: 0000000000423e1d
[ 615.567601] RBP: 00007fd2e154b2f0 R08: 0000000000000001 R09: 0000000000000001
[ 615.567604] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000020
[ 615.567607] R13: ffffffffffffffd0 R14: 0000000000000000 R15: 00007ffc9d2f0e30
[ 615.567613] </TASK>
[ 615.584391] Allocated by task 164453:
[ 615.584797] kasan_save_stack+0x30/0x50
[ 615.585226] kasan_save_track+0x14/0x30
[ 615.585646] __kasan_slab_alloc+0x89/0x90
[ 615.586081] kmem_cache_alloc_noprof+0x133/0x340
[ 615.586581] copy_mm+0x327/0x2380
[ 615.586953] copy_process+0x6c5b/0x7180
[ 615.587382] kernel_clone+0x101/0x870
[ 615.587796] __do_sys_clone+0xda/0x120
[ 615.588213] do_syscall_64+0xbb/0x1f0
[ 615.588617] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 615.589352] Freed by task 164453:
[ 615.589721] kasan_save_stack+0x30/0x50
[ 615.590145] kasan_save_track+0x14/0x30
[ 615.590564] kasan_save_free_info+0x3b/0x70
[ 615.591014] __kasan_slab_free+0x4f/0x70
[ 615.591446] kmem_cache_free+0x15d/0x4b0
[ 615.591872] finish_task_switch.isra.0+0x4ab/0x810
[ 615.592388] __schedule+0xf39/0x2fc0
[ 615.592785] schedule+0xdf/0x340
[ 615.593153] do_nanosleep+0x154/0x500
[ 615.593556] hrtimer_nanosleep+0x150/0x350
[ 615.593999] common_nsleep+0xa6/0xd0
[ 615.594400] __x64_sys_clock_nanosleep+0x33c/0x480
[ 615.594912] do_syscall_64+0xbb/0x1f0
[ 615.595322] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 615.596052] The buggy address belongs to the object at ffff888003cd2e40
which belongs to the cache mm_struct of size 2192
[ 615.597309] The buggy address is located 400 bytes inside of
freed 2192-byte region [ffff888003cd2e40, ffff888003cd36d0)
[ 615.598750] The buggy address belongs to the physical page:
[ 615.599336] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3cd0
[ 615.600171] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 615.600972] memcg:ffff888002781801
[ 615.601357] anon flags: 0x100000000000040(head|node=0|zone=1)
[ 615.601974] page_type: f5(slab)
[ 615.602334] raw: 0100000000000040 ffff88810004fdc0 0000000000000000 dead000000000001
[ 615.603145] raw: 0000000000000000 00000000000d000d 00000001f5000000 ffff888002781801
[ 615.603960] head: 0100000000000040 ffff88810004fdc0 0000000000000000 dead000000000001
[ 615.604777] head: 0000000000000000 00000000000d000d 00000001f5000000 ffff888002781801
[ 615.605591] head: 0100000000000003 ffffea00000f3401 ffffffffffffffff 0000000000000000
[ 615.606406] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[ 615.607218] page dumped because: kasan: bad access detected
[ 615.607991] Memory state around the buggy address:
[ 615.608502] ffff888003cd2e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 615.609260] ffff888003cd2f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 615.610014] >ffff888003cd2f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 615.610781] ^
[ 615.611394] ffff888003cd3000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 615.612157] ffff888003cd3080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 615.612909] ==================================================================
[ 615.672368] Disabling lock debugging due to kernel taint
Attachment:
repro.c
Description: Binary data