Re: [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

From: Steven Rostedt
Date: Fri Sep 08 2017 - 15:42:51 EST


On Fri, 8 Sep 2017 15:31:35 -0400
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Tue, 5 Sep 2017 16:57:47 -0500
> Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx> wrote:
>
> > Change the order event_mutex and trace_types_lock are taken, to avoid
> > circular dependencies and lockdep spew.
> >
> > Changing the order shouldn't matter to any current code, but does to
> > anything that takes the event_mutex first and then trace_types_lock.
> > This is the case when calling tracing_set_clock from inside an event
> > command, which already holds the event_mutex.
>
> This is a very scary patch. I'll apply it and run a bunch of tests with
> lockdep enabled. Let's see what blows up (or not).

Boom!

======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc7-test+ #84 Not tainted
------------------------------------------------------
mkdir/1674 is trying to acquire lock:
(event_mutex){+.+.+.}, at: [<ffffffff811b18bd>] event_trace_add_tracer+0x1d/0xb0

but task is already holding lock:
(trace_types_lock){+.+.+.}, at: [<ffffffff811a121f>] instance_mkdir+0x2f/0x250

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (trace_types_lock){+.+.+.}:
lock_acquire+0xe3/0x1d0
__mutex_lock+0x81/0x950
mutex_lock_nested+0x1b/0x20
trace_module_notify+0x33/0x1b0
notifier_call_chain+0x4a/0x70
__blocking_notifier_call_chain+0x4d/0x70
blocking_notifier_call_chain+0x16/0x20
load_module+0x21df/0x2dd0
SYSC_finit_module+0xbc/0xf0
SyS_finit_module+0xe/0x10
do_syscall_64+0x62/0x140
return_from_SYSCALL_64+0x0/0x7a

-> #0 (event_mutex){+.+.+.}:
__lock_acquire+0x1026/0x11d0
lock_acquire+0xe3/0x1d0
__mutex_lock+0x81/0x950
mutex_lock_nested+0x1b/0x20
event_trace_add_tracer+0x1d/0xb0
instance_mkdir+0x173/0x250
tracefs_syscall_mkdir+0x40/0x70
vfs_mkdir+0xfb/0x190
SyS_mkdir+0x6b/0xd0
entry_SYSCALL_64_fastpath+0x1f/0xbe

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(trace_types_lock);
lock(event_mutex);
lock(trace_types_lock);
lock(event_mutex);

*** DEADLOCK ***

2 locks held by mkdir/1674:
#0: (sb_writers#10){.+.+.+}, at: [<ffffffff812b4824>] mnt_want_write+0x24/0x50
#1: (trace_types_lock){+.+.+.}, at: [<ffffffff811a121f>] instance_mkdir+0x2f/0x250

stack backtrace:
CPU: 3 PID: 1674 Comm: mkdir Not tainted 4.13.0-rc7-test+ #84
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x86/0xcf
print_circular_bug+0x1be/0x210
__lock_acquire+0x1026/0x11d0
lock_acquire+0xe3/0x1d0
? lock_acquire+0xe3/0x1d0
? event_trace_add_tracer+0x1d/0xb0
? event_trace_add_tracer+0x1d/0xb0
__mutex_lock+0x81/0x950
? event_trace_add_tracer+0x1d/0xb0
? event_trace_add_tracer+0x1d/0xb0
? __create_dir+0xcb/0x130
mutex_lock_nested+0x1b/0x20
? mutex_lock_nested+0x1b/0x20
event_trace_add_tracer+0x1d/0xb0
instance_mkdir+0x173/0x250
tracefs_syscall_mkdir+0x40/0x70
? tracefs_syscall_mkdir+0x40/0x70
vfs_mkdir+0xfb/0x190
SyS_mkdir+0x6b/0xd0
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7f4867afa947
RSP: 002b:00007ffd3dc35c08 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f4867afa947
RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffd3dc3764f
RBP: ffffc90000f2ff98 R08: 00000000000001ff R09: 000055f96a551ac0
R10: 000055f96b700010 R11: 0000000000000202 R12: 0000000000000001
R13: 00007ffd3dc35f28 R14: 00000000000001ff R15: ffffffff8189316a
? entry_SYSCALL_64_after_swapgs+0x17/0x4f

It appears to be caused by instance creation. I'll look at that.


-- Steve