Re: [PATCH] perf: Optimize perf_pmu_migrate_context()

From: Thomas Gleixner
Date: Mon Apr 03 2023 - 18:07:37 EST


On Mon, Apr 03 2023 at 11:08, Peter Zijlstra wrote:
> Thomas reported that offlining CPUs spends a lot of time in
> synchronize_rcu() as called from perf_pmu_migrate_context() even though
> he's not actually using uncore events.

That happens when offlining CPUs from a socket > 0 in the same order how
those CPUs have been brought up. On socket 0 this is not observable
unless the bogus CPU0 offlining hack is enabled.

If the offlining happens in the reverse order then all is shiny.

The reason is that the first online CPU on a socket gets the uncore
events assigned and when it is offlined then those are moved to the next
online CPU in the same socket.

On a SKL-X with 56 threads per sockets this results in a whopping _1_
second delay per thread (except for the last one which shuts down the
per socket uncore events with no delay because there are no users) due
to 62 times of pointless synchronize_rcu() invocations where each takes
~16ms on a HZ=250 kernel.

Which in turn is interesting because that machine is completely idle
other than running the offline muck...

> Turns out, the thing is unconditionally waiting for RCU, even if there's
> no actual events to migrate.
>
> Fixes: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()")
> Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Tested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>