Re: [PATCH bpf-next v5 1/3] perf: Refactor get_perf_callchain

From: bot+bpf-ci
Date: Sun Nov 09 2025 - 11:58:29 EST


> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index fd1d91017..edd3058e4 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -67,6 +67,7 @@ struct perf_callchain_entry_ctx {
> u32 nr;
> short contexts;
> bool contexts_maxed;
> + bool add_mark;
> };

[ ... ]

> diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
> index 808c0d7a3..fb1f26be2 100644
> --- a/kernel/events/callchain.c
> +++ b/kernel/events/callchain.c

[ ... ]

> +void __get_perf_callchain_user(struct perf_callchain_entry_ctx *ctx, struct pt_regs *regs)
> +{
> + int start_entry_idx;
> +
> + if (!user_mode(regs)) {
> + if (current->flags & (PF_KTHREAD | PF_USER_WORKER))
> + return;
> + regs = task_pt_regs(current);
> + }
> +
> + if (ctx->add_mark)
> + perf_callchain_store_context(ctx, PERF_CONTEXT_USER);
> +
> + start_entry_idx = ctx->nr;
^^^^^^^^
> + perf_callchain_user(ctx, regs);
> + fixup_uretprobe_trampoline_entries(ctx->entry, start_entry_idx);
> +}

Can this break uprobe trampoline fixups? The original code used
entry->nr for start_entry_idx, but the refactored version uses ctx->nr.

These two counters track different things:
- entry->nr counts ALL entries (incremented by both
perf_callchain_store_context() and perf_callchain_store())
- ctx->nr only counts regular callchain entries (incremented only by
perf_callchain_store(), not by perf_callchain_store_context())

When add_mark is true, perf_callchain_store_context() increments
entry->nr but not ctx->nr, causing them to become out of sync.

For example, if kernel callchain collection runs first with add_mark
set:

Initial: ctx->nr = entry->nr = 0

In __get_perf_callchain_kernel():
perf_callchain_store_context() stores PERF_CONTEXT_KERNEL marker
-> entry->nr becomes 1, ctx->nr still 0
perf_callchain_kernel() stores N kernel frames
-> entry->nr becomes 1+N, ctx->nr becomes N

In __get_perf_callchain_user():
perf_callchain_store_context() stores PERF_CONTEXT_USER marker
-> entry->nr becomes 2+N, ctx->nr still N
start_entry_idx = ctx->nr assigns N (should be 2+N)
fixup_uretprobe_trampoline_entries() scans from wrong index

This causes fixup_uretprobe_trampoline_entries() to scan kernel
callchain entries instead of user callchain entries, missing the
uretprobe trampolines that need to be fixed up.

Should this use ctx->entry->nr instead of ctx->nr?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19211481431