Re: [PATCH] bpf-next: Prevent out of bound buffer write in __bpf_get_stack
From: Lecomte, Arnaud
Date: Tue Jan 13 2026 - 16:01:40 EST
On 09/01/2026 19:05, Andrii Nakryiko wrote:
On Wed, Jan 7, 2026 at 10:35 AM Lecomte, Arnaud <contact@xxxxxxxxxxxxxx> wrote:
ok, so what are we doing then?
On 07/01/2026 19:08, Lecomte, Arnaud wrote:
On 06/01/2026 01:51, Andrii Nakryiko wrote:Nvm, this is not really possible as we are checking that the trace is
On Sun, Jan 4, 2026 at 12:52 PM Arnaud LecomteWe could indeed be more proactive on the clamping even-though I would
<contact@xxxxxxxxxxxxxx> wrote:
Syzkaller reported a KASAN slab-out-of-bounds write inthere is `trace->nr < skip` check right above, should it be moved here
__bpf_get_stack()
during stack trace copying.
The issue occurs when: the callchain entry (stored as a per-cpu
variable)
grow between collection and buffer copy, causing it to exceed the
initially
calculated buffer size based on max_depth.
The callchain collection intentionally avoids locking for performance
reasons, but this creates a window where concurrent modifications can
occur during the copy operation.
To prevent this from happening, we clamp the trace len to the max
depth initially calculated with the buffer size and the size of
a trace.
Reported-by: syzbot+d1b7fa1092def3628bd7@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes:
https://lore.kernel.org/all/691231dc.a70a0220.22f260.0101.GAE@xxxxxxxxxx/T/
Fixes: e17d62fedd10 ("bpf: Refactor stack map trace depth
calculation into helper function")
Tested-by: syzbot+d1b7fa1092def3628bd7@xxxxxxxxxxxxxxxxxxxxxxxxx
Cc: Brahmajit Das <listout@xxxxxxxxxxx>
Signed-off-by: Arnaud Lecomte <contact@xxxxxxxxxxxxxx>
---
Thanks Brahmajit Das for the initial fix he proposed that I tweaked
with the correct justification and a better implementation in my
opinion.
---
kernel/bpf/stackmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index da3d328f5c15..e56752a9a891 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -465,7 +465,6 @@ static long __bpf_get_stack(struct pt_regs
*regs, struct task_struct *task,
if (trace_in) {
trace = trace_in;
- trace->nr = min_t(u32, trace->nr, max_depth);
} else if (kernel && task) {
trace = get_callchain_entry_for_task(task, max_depth);
} else {
@@ -479,7 +478,8 @@ static long __bpf_get_stack(struct pt_regs
*regs, struct task_struct *task,
goto err_fault;
}
- trace_nr = trace->nr - skip;
+ trace_nr = min(trace->nr, max_depth);
and done against adjusted trace_nr (but before we subtract skip, of
course)?
say it does not fundamentally change anything in my opinion.
Happy to raise a new rev.
not NULL.
Moving it above could lead to a NULL dereference.
if (unlikely(!trace)) { ... }
trace_nr = min(trace->nr, max_depth);
if (trace->nr < skip) { ... }
trace_nr = trace->nr - skip;
(which is what I proposed, or am I still missing why this shouldn't be
done like that?)
I think it doesn't really bring any value to adopt this split check.
The underlying problem is the lack of locking mechanism on
the trace structure. I will try to find a workaround minimizing
it's performance impact. I think this is an interesting problem
actually. Will get back to you soon !
pw-bot: cr
Thanks for the review !+ trace_nr = trace_nr - skip;
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
--
2.43.0
Arnaud
Thanks, Arnaud