[tip: perf/urgent] perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list

From: tip-bot2 for Luo Gengkun
Date: Mon Feb 24 2025 - 13:37:06 EST


The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: 2016066c66192a99d9e0ebf433789c490a6785a2
Gitweb: https://git.kernel.org/tip/2016066c66192a99d9e0ebf433789c490a6785a2
Author: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx>
AuthorDate: Wed, 22 Jan 2025 07:33:56
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitterDate: Mon, 24 Feb 2025 19:22:37 +01:00

perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list

Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.

The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.

To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.

The follow testcase can trigger above warning:

# perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
# perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out

test.c

void main() {
int count = 0;
pid_t pid;

printf("%d running\n", getpid());
sleep(30);
printf("running\n");

pid = fork();
if (pid == -1) {
printf("fork error\n");
return;
}
if (pid == 0) {
while (1) {
count++;
}
} else {
while (1) {
count++;
}
}
}

The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.

[ mingo: Tidied up the changelog. ]

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Luo Gengkun <luogengkun@xxxxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/r/20250122073356.1824736-1-luogengkun@xxxxxxxxxxxxxxx
---
kernel/events/core.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7dabbca..086d46d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4950,7 +4950,7 @@ static struct perf_event_pmu_context *
find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
struct perf_event *event)
{
- struct perf_event_pmu_context *new = NULL, *epc;
+ struct perf_event_pmu_context *new = NULL, *pos = NULL, *epc;
void *task_ctx_data = NULL;

if (!ctx->task) {
@@ -5007,12 +5007,19 @@ find_get_pmu_context(struct pmu *pmu, struct perf_event_context *ctx,
atomic_inc(&epc->refcount);
goto found_epc;
}
+ /* Make sure the pmu_ctx_list is sorted by PMU type: */
+ if (!pos && epc->pmu->type > pmu->type)
+ pos = epc;
}

epc = new;
new = NULL;

- list_add(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+ if (!pos)
+ list_add_tail(&epc->pmu_ctx_entry, &ctx->pmu_ctx_list);
+ else
+ list_add(&epc->pmu_ctx_entry, pos->pmu_ctx_entry.prev);
+
epc->ctx = ctx;

found_epc: