Re: [PATCH bpf-next v5 1/3] perf: Refactor get_perf_callchain
From: Tao Chen
Date: Mon Nov 10 2025 - 04:32:09 EST
在 2025/11/10 00:58, bot+bpf-ci@xxxxxxxxxx 写道:
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fd1d91017..edd3058e4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -67,6 +67,7 @@ struct perf_callchain_entry_ctx {
u32 nr;
short contexts;
bool contexts_maxed;
+ bool add_mark;
};
[ ... ]
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 808c0d7a3..fb1f26be2 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
[ ... ]
+void __get_perf_callchain_user(struct perf_callchain_entry_ctx *ctx, struct pt_regs *regs)
+{
+ int start_entry_idx;
+
+ if (!user_mode(regs)) {
+ if (current->flags & (PF_KTHREAD | PF_USER_WORKER))
+ return;
+ regs = task_pt_regs(current);
+ }
+
+ if (ctx->add_mark)
+ perf_callchain_store_context(ctx, PERF_CONTEXT_USER);
+
+ start_entry_idx = ctx->nr;
^^^^^^^^
+ perf_callchain_user(ctx, regs);
+ fixup_uretprobe_trampoline_entries(ctx->entry, start_entry_idx);
+}
Can this break uprobe trampoline fixups? The original code used
entry->nr for start_entry_idx, but the refactored version uses ctx->nr.
These two counters track different things:
- entry->nr counts ALL entries (incremented by both
perf_callchain_store_context() and perf_callchain_store())
- ctx->nr only counts regular callchain entries (incremented only by
perf_callchain_store(), not by perf_callchain_store_context())
When add_mark is true, perf_callchain_store_context() increments
entry->nr but not ctx->nr, causing them to become out of sync.
For example, if kernel callchain collection runs first with add_mark
set:
Initial: ctx->nr = entry->nr = 0
In __get_perf_callchain_kernel():
perf_callchain_store_context() stores PERF_CONTEXT_KERNEL marker
-> entry->nr becomes 1, ctx->nr still 0
perf_callchain_kernel() stores N kernel frames
-> entry->nr becomes 1+N, ctx->nr becomes N
In __get_perf_callchain_user():
perf_callchain_store_context() stores PERF_CONTEXT_USER marker
-> entry->nr becomes 2+N, ctx->nr still N
start_entry_idx = ctx->nr assigns N (should be 2+N)
fixup_uretprobe_trampoline_entries() scans from wrong index
This causes fixup_uretprobe_trampoline_entries() to scan kernel
callchain entries instead of user callchain entries, missing the
uretprobe trampolines that need to be fixed up.
Should this use ctx->entry->nr instead of ctx->nr?
Use ctx->entry->nr looks better, will change it.
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19211481431
--
Best Regards
Tao Chen