Re: [RFC 2/2] perf: Sharing PMU counters across compatible events

From: Peter Zijlstra
Date: Mon May 28 2018 - 08:40:45 EST


On Fri, May 04, 2018 at 04:11:02PM -0700, Song Liu wrote:
> Connection among perf_event and perf_event_dup are built with function
> rebuild_event_dup_list(cpuctx). This function is only called when events
> are added/removed or when a task is scheduled in/out. So it is not on
> critical path of perf_rotate_context().

Why is perf_rotate_context() the only critical path? I would say the
context switch path is rather critical too.

> @@ -2919,8 +3014,10 @@ static void ctx_sched_out(struct perf_event_context *ctx,
>
> if (ctx->task) {
> WARN_ON_ONCE(cpuctx->task_ctx != ctx);
> - if (!ctx->is_active)
> + if (!ctx->is_active) {
> cpuctx->task_ctx = NULL;
> + rebuild_event_dup_list(cpuctx);
> + }
> }
>
> /*

> +static void rebuild_event_dup_list(struct perf_cpu_context *cpuctx)
> +{
> + int dup_count = cpuctx->ctx.nr_events;
> + struct perf_event_context *ctx = cpuctx->task_ctx;
> + struct sched_in_data sid = {
> + .ctx = ctx,
> + .cpuctx = cpuctx,
> + .can_add_hw = 1,
> + };
> +
> + if (ctx)
> + dup_count += ctx->nr_events;
> +
> + kfree(cpuctx->dup_event_list);
> + cpuctx->dup_event_count = 0;
> +
> + cpuctx->dup_event_list =
> + kzalloc(sizeof(struct perf_event_dup) * dup_count, GFP_ATOMIC);


__schedule()
local_irq_disable()
raw_spin_lock(rq->lock)
context_switch()
prepare_task_switch()
perf_event_task_sched_out()
__perf_event_task_sched_out()
perf_event_context_sched_out()
task_ctx_sched_out()
ctx_sched_out()
rebuild_event_dup_list()
kzalloc()
...
spin_lock()

Also, as per the above, this nests a regular spin lock inside the
(raw) rq->lock, which is a no-no.

Not to mention that whole O(n) crud in the scheduling path...