Re: possible deadlock in perf_event_ctx_lock_nested

From: Peter Zijlstra
Date: Fri Oct 27 2017 - 11:34:06 EST


On Fri, Oct 27, 2017 at 05:11:37PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 27, 2017 at 01:30:30AM -0700, syzbot wrote:
>
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-next-20170911+ #19 Not tainted
> > ------------------------------------------------------
> > syz-executor2/12380 is trying to acquire lock:
> > (&ctx->mutex){+.+.}, at: [<ffffffff8180923c>]
> > perf_event_ctx_lock_nested+0x1dc/0x3c0 kernel/events/core.c:1210
> >
> > but task is already holding lock:
> > (&pipe->mutex/1){+.+.}, at: [<ffffffff81ac0fa6>] pipe_lock_nested
> > fs/pipe.c:66 [inline]
> > (&pipe->mutex/1){+.+.}, at: [<ffffffff81ac0fa6>] pipe_lock+0x56/0x70
> > fs/pipe.c:74
> >
> > which lock already depends on the new lock.
>
>
> ARRGH!!
>
> that translates like the below, which is an absolute maze and requires
> at least 5 concurrent callstacks, possibly more.
>
> We already had a lot of fun with hotplug-perf-ftrace, but the below
> contains more. Let me try and page that previous crap back.
>
>
>
> perf_ioctl()
> #0 perf_event_ctx_lock() [ctx->mutex]
> perf_event_set_filter
> #1 ftrace_profile_set_filter [event_mutex]
>
>
>
>
> sys_perf_event_open
> ...
> perf_trace_init
> #1 mutex_lock [event_mutex]
> trace_event_reg
> tracepoint_probe_register
> #2 mutex_lock() [tracepoints_mutex]
> tracepoint_add_func()
> #3 static_key_slow_inc() [cpuhotplug_lock]
>
>
>
>
>
> cpuhp_setup_state_nocalls
> #3 cpus_read_lock [cpuhotplug_lock]
> __cpuhp_setup_state_cpuslocked
> #4 mutex_lock [cpuhp_state_mutex]
> cpuhp_issue_call
> #5 cpuhp_invoke_ap_callback() [cpuhp_state]
>
>
> #5 cpuhp_invoke_callback [cpuhp_state]
> ...
> devtmpfs_create_node
> #6 wait_for_completion() [&req.done]
>
> devtmpfsd
> handle_create
> #7 filename_create [sb_writers]
> #6 complete [&req.done]
>
>
>
>
>
>
> do_splice
> #7 file_start_write() [sb_writers]
> do_splice_from
> iter_file_splice_write
> #8 pipe_lock [pipe->mutex]
>
>
>
>
>
> do_splice
> #8 pipe_lock [pipe->mutex]
> do_splice_to
> ...
> #0 perf_read() [ctx->mutex]
>

So arguably that last op, splice_read from a perf fd is fairly
pointless and we could dis-allow that. How about something like the
below?

---
kernel/events/core.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 04989fb769f0..fd03f3082ee3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5468,6 +5468,13 @@ static int perf_fasync(int fd, struct file *filp, int on)
return 0;
}

+static ssize_t perf_splice_read(struct file *file, loff_t *ppos,
+ struct pipe_inode_info *pope, size_t len,
+ unsigned int flags)
+{
+ return -EOPNOTSUPP;
+}
+
static const struct file_operations perf_fops = {
.llseek = no_llseek,
.release = perf_release,
@@ -5477,6 +5484,7 @@ static int perf_fasync(int fd, struct file *filp, int on)
.compat_ioctl = perf_compat_ioctl,
.mmap = perf_mmap,
.fasync = perf_fasync,
+ .splice_read = perf_splice_read,
};

/*