Re: [RFC][PATCH 0/5] perf/tracing/cpuhotplug: Fix locking order

From: Paul E. McKenney
Date: Fri May 12 2017 - 14:14:26 EST


On Fri, May 12, 2017 at 01:15:44PM -0400, Steven Rostedt wrote:
> NOTE: This was quickly written. The change logs and patches probably need
> some loving. This is for discussion. These may become legitimate patches,
> but for now, I'm seeing if this is an acceptable solution.
>
> Also note, I checked out the last branch that I had Linus pull, and then
> merged with tip's commit b53e5129~1, which is the commit just before
> "trace/perf: Cure hotplug lock ordering issues" in tip/smp/hotplug
> that placed get_online_cpus() ahead of event_mutex. The issue is that
> event_mutex is a high level mutex, and trying to make sure that
> get_online_cpus() is held whenever it is held and is going to be a
> nightmare. So I started ahead of that commit, but I needed changes
> in my own repo to do full testing. I renamed this temp branch to
> "tip/cpuhotplug", and you can pull it as well. But this is not going
> to be a candidate for pushing to Linus.
>
> The 5 patches here are:
>
> 1) fix a lockdep splat when taking stack trace in irqsoff tracing
> (one of my tests triggered this)
>
> 2) Allow for get_online_cpus() to nest
>
> 3) Have kprobes take get_online_cpus() before taking jump label lock.
>
> 4) Have tracepoints always take get_online_cpus
>
> 5) Have perf take event_mutex before taking get_online_cpus()
>
> Thoughts?

First a thought on the use of get_online_cpus()...

Your mileage may vary, but from what I have seen, life is very good
if your piece of the kernel uses get_online_cpus(), but refrains from
using the CPU-hotplug notifiers. Life is also very good if your piece
of the kernel uses CPU-hotplug notifiers, but refrains from using
get_online_cpus(). If your code uses both get_online_cpus() -and-
CPU-hotplug notifiers for the same data structures, life can be quite
complex and difficult.

For RCU, I cannot avoid using notifiers, so I have been removing uses
of get_online_cpus(). Removing each of these has made large numbers of
race conditions with CPU hotplug simply disappear -- not much reduction
in lines of code, but big reductions in complexity and in state space.
I have one more to go in _rcu_barrier().

Again, your mileage may vary.

Thanx, Paul

> -- Steve
>
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
> tip/cpuhotplug
>
> Head SHA1: ddb06e08fa10ed5030285267a9b1e25d40c337c8
>
>
> Steven Rostedt (VMware) (5):
> tracing: Make sure RCU is watching before calling a stack trace
> cpu-hotplug: Allow get_online_cpus() to nest
> kprobes: Take get_online_cpus() before taking jump_label_lock()
> tracepoints: Grab get_online_cpus() before taking tracepoints_mutex
> perf: Grab event_mutex before taking get_online_cpus()
>
> ----
> include/linux/sched.h | 1 +
> include/linux/trace_events.h | 2 ++
> kernel/cpu.c | 9 +++++++++
> kernel/events/core.c | 9 +++++++++
> kernel/fork.c | 1 +
> kernel/kprobes.c | 6 +++---
> kernel/trace/trace.c | 5 +++++
> kernel/trace/trace.h | 1 -
> kernel/trace/trace_event_perf.c | 8 ++++----
> kernel/tracepoint.c | 5 +++++
> 10 files changed, 39 insertions(+), 8 deletions(-)
>