Re: [PATCH] bpf-next: Prevent out of bound buffer write in __bpf_get_stack

From: Andrii Nakryiko

Date: Fri Jan 09 2026 - 14:05:34 EST

On Wed, Jan 7, 2026 at 10:35 AM Lecomte, Arnaud <contact@xxxxxxxxxxxxxx> wrote:
>
>
> On 07/01/2026 19:08, Lecomte, Arnaud wrote:
> >
> > On 06/01/2026 01:51, Andrii Nakryiko wrote:
> >> On Sun, Jan 4, 2026 at 12:52 PM Arnaud Lecomte
> >> <contact@xxxxxxxxxxxxxx> wrote:
> >>> Syzkaller reported a KASAN slab-out-of-bounds write in
> >>> __bpf_get_stack()
> >>> during stack trace copying.
> >>>
> >>> The issue occurs when: the callchain entry (stored as a per-cpu
> >>> variable)
> >>> grow between collection and buffer copy, causing it to exceed the
> >>> initially
> >>> calculated buffer size based on max_depth.
> >>>
> >>> The callchain collection intentionally avoids locking for performance
> >>> reasons, but this creates a window where concurrent modifications can
> >>> occur during the copy operation.
> >>>
> >>> To prevent this from happening, we clamp the trace len to the max
> >>> depth initially calculated with the buffer size and the size of
> >>> a trace.
> >>>
> >>> Reported-by: syzbot+d1b7fa1092def3628bd7@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>> Closes:
> >>> https://lore.kernel.org/all/691231dc.a70a0220.22f260.0101.GAE@xxxxxxxxxx/T/
> >>> Fixes: e17d62fedd10 ("bpf: Refactor stack map trace depth
> >>> calculation into helper function")
> >>> Tested-by: syzbot+d1b7fa1092def3628bd7@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>> Cc: Brahmajit Das <listout@xxxxxxxxxxx>
> >>> Signed-off-by: Arnaud Lecomte <contact@xxxxxxxxxxxxxx>
> >>> ---
> >>> Thanks Brahmajit Das for the initial fix he proposed that I tweaked
> >>> with the correct justification and a better implementation in my
> >>> opinion.
> >>> ---
> >>> kernel/bpf/stackmap.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >>> index da3d328f5c15..e56752a9a891 100644
> >>> --- a/kernel/bpf/stackmap.c
> >>> +++ b/kernel/bpf/stackmap.c
> >>> @@ -465,7 +465,6 @@ static long __bpf_get_stack(struct pt_regs
> >>> *regs, struct task_struct *task,
> >>>
> >>> if (trace_in) {
> >>> trace = trace_in;
> >>> - trace->nr = min_t(u32, trace->nr, max_depth);
> >>> } else if (kernel && task) {
> >>> trace = get_callchain_entry_for_task(task, max_depth);
> >>> } else {
> >>> @@ -479,7 +478,8 @@ static long __bpf_get_stack(struct pt_regs
> >>> *regs, struct task_struct *task,
> >>> goto err_fault;
> >>> }
> >>>
> >>> - trace_nr = trace->nr - skip;
> >>> + trace_nr = min(trace->nr, max_depth);
> >> there is `trace->nr < skip` check right above, should it be moved here
> >> and done against adjusted trace_nr (but before we subtract skip, of
> >> course)?
> > We could indeed be more proactive on the clamping even-though I would
> > say it does not fundamentally change anything in my opinion.
> > Happy to raise a new rev.
> Nvm, this is not really possible as we are checking that the trace is
> not NULL.
> Moving it above could lead to a NULL dereference.

ok, so what are we doing then?

if (unlikely(!trace)) { ... }

trace_nr = min(trace->nr, max_depth);

if (trace->nr < skip) { ... }

trace_nr = trace->nr - skip;

(which is what I proposed, or am I still missing why this shouldn't be
done like that?)

pw-bot: cr

> >>> + trace_nr = trace_nr - skip;
> >>> copy_len = trace_nr * elem_size;
> >>>
> >>> ips = trace->ip + skip;
> >>> --
> >>> 2.43.0
> >>>
> > Thanks for the review !
> > Arnaud