Re: [PATCH] perf: Optimize perf_pmu_migrate_context()

From: Paul E. McKenney
Date: Mon Apr 03 2023 - 18:51:43 EST


On Tue, Apr 04, 2023 at 12:07:30AM +0200, Thomas Gleixner wrote:
> On Mon, Apr 03 2023 at 11:08, Peter Zijlstra wrote:
> > Thomas reported that offlining CPUs spends a lot of time in
> > synchronize_rcu() as called from perf_pmu_migrate_context() even though
> > he's not actually using uncore events.
>
> That happens when offlining CPUs from a socket > 0 in the same order how
> those CPUs have been brought up. On socket 0 this is not observable
> unless the bogus CPU0 offlining hack is enabled.
>
> If the offlining happens in the reverse order then all is shiny.
>
> The reason is that the first online CPU on a socket gets the uncore
> events assigned and when it is offlined then those are moved to the next
> online CPU in the same socket.
>
> On a SKL-X with 56 threads per sockets this results in a whopping _1_
> second delay per thread (except for the last one which shuts down the
> per socket uncore events with no delay because there are no users) due
> to 62 times of pointless synchronize_rcu() invocations where each takes
> ~16ms on a HZ=250 kernel.
>
> Which in turn is interesting because that machine is completely idle
> other than running the offline muck...
>
> > Turns out, the thing is unconditionally waiting for RCU, even if there's
> > no actual events to migrate.
> >
> > Fixes: 0cda4c023132 ("perf: Introduce perf_pmu_migrate_context()")
> > Reported-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> > Tested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>
> Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

Yow! ;-)

Assuming that all the events run under RCU protection, as in preemption
disabled:

Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxx>