Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order

From: Paul E. McKenney
Date: Tue May 16 2017 - 08:46:17 EST


On Tue, May 16, 2017 at 10:19:23AM +0200, Peter Zijlstra wrote:
> On Mon, May 15, 2017 at 11:40:43AM -0700, Paul E. McKenney wrote:
>
> > Given that you acquire the global pmus_lock when doing the
> > get_online_cpus(), and given that CPU hotplug is rare, is it possible
> > to momentarily acquire the global pmus_lock in perf_event_init_cpu()
> > and perf_event_exit_cpu() and interact directly with that? Then perf
> > would presumably leave alone any outgoing CPU that had already executed
> > perf_event_exit_cpu(), and also any incoming CPU that had not already
> > executed perf_event_init_cpu().
> >
> > What prevents this approach from working?
>
> Lack of sleep probably ;-)

I know that feeling...

> I'd blame the kids, but those have actually been very good lately.

I don't get that excuse anymore, all are on their own. So I need
to come up with some fresh excuses. ;-)

> You're suggesting the below on top, right? I'll run it with lockdep
> enabled after I chase some regression..

Something like this, yes. Maybe even exactly like this. ;-)

> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8997,7 +8997,6 @@ int perf_pmu_register(struct pmu *pmu, c
> {
> int cpu, ret;
>
> - get_online_cpus();
> mutex_lock(&pmus_lock);
> ret = -ENOMEM;
> pmu->pmu_disable_count = alloc_percpu(int);

There is usually also some state check in here somewhere for the CPU
being offline from a perf perspective. Such a check might already exist,
but I must plead ignorance of perf.

> @@ -9093,7 +9092,6 @@ int perf_pmu_register(struct pmu *pmu, c
> ret = 0;
> unlock:
> mutex_unlock(&pmus_lock);
> - put_online_cpus();
>
> return ret;
>
> @@ -11002,10 +11000,9 @@ static void perf_event_exit_cpu_context(
> struct perf_cpu_context *cpuctx;
> struct perf_event_context *ctx;
> struct pmu *pmu;
> - int idx;
>
> - idx = srcu_read_lock(&pmus_srcu);
> - list_for_each_entry_rcu(pmu, &pmus, entry) {
> + mutex_lock(&pmus_lock);

If the state change checked for by perf_pmu_register() needs to be also
guarded by ctx->mutex, this looks right to me.

Just for completeness, the other style is to maintain separate per-CPU
state, in which case you would instead acquire pmus_lock, mark this
CPU off-limits to more perf_pmu_register() usage, release pmus_lock,
then clean up any old usage.

The approach you have here seems to work best when the cleanup
and initialization naturally mark the CPU as off limits and ready,
respectively. The other style seems to work best when you need a separate
indication of which CPUs are off limits and usable.

RCU is an example of the other style, with the rcu_node structure's
->qsmaskinitnext mask serving to mark which CPUs usable. One reason
that the other style works so well for RCU is that a CPU coming online
has no effect on the current grace period, so rcu_cpu_starting() just
sets the CPU's bit in ->qsmaskinitnext, which takes effect only once
the the next grace period starts.

It is quite possible that many of the other use cases instead need to
use something like what you have here. I suspect that the common case
is that a CPU appearing or disappearing must have some immediate effect.

> + list_for_each_entry(pmu, &pmus, entry) {
> cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> ctx = &cpuctx->ctx;
>
> @@ -11014,7 +11011,7 @@ static void perf_event_exit_cpu_context(
> cpuctx->online = 0;
> mutex_unlock(&ctx->mutex);
> }
> - srcu_read_unlock(&pmus_srcu, idx);
> + mutex_unlock(&pmus_lock);
> }
> #else
>
> @@ -11027,12 +11024,11 @@ int perf_event_init_cpu(unsigned int cpu
> struct perf_cpu_context *cpuctx;
> struct perf_event_context *ctx;
> struct pmu *pmu;
> - int idx;
>
> perf_swevent_init_cpu(cpu);
>
> - idx = srcu_read_lock(&pmus_srcu);
> - list_for_each_entry_rcu(pmu, &pmus, entry) {
> + mutex_lock(&pmus_lock);
> + list_for_each_entry(pmu, &pmus, entry) {
> cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
> ctx = &cpuctx->ctx;
>
> @@ -11040,7 +11036,7 @@ int perf_event_init_cpu(unsigned int cpu
> cpuctx->online = 1;
> mutex_unlock(&ctx->mutex);
> }
> - srcu_read_unlock(&pmus_srcu, idx);
> + mutex_unlock(&pmus_lock);

And same here.

Again for completeness, the other style would be to mark this CPU
as ready for perf usage at the very end, protected by pmus_lock.

Thanx, Paul

> return 0;
> }
>