[BUG] jbd2: lockdep deadlock warning

From: Thomas Gleixner
Date: Fri Sep 16 2016 - 09:38:18 EST


Jan,

wer are currently moving the RT patch to 4.8 and on our test runs we
triggered the following lockdep splat:

[ 153.887910] ======================================================
[ 153.887911] [ INFO: possible circular locking dependency detected ]
[ 153.887912] 4.8.0-rc6-rt0+ #679 Tainted: G W
[ 153.887912] -------------------------------------------------------
[ 153.887913] find/2537 is trying to acquire lock:
[ 153.887923] (&journal->j_state_lock){+.+...}, at: [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[ 153.887924]
[ 153.887924] but task is already holding lock:
[ 153.887925] (jbd2_handle){++++..}, at: [<ffffffff812f5411>] start_this_handle+0x111/0x470
[ 153.887925]
[ 153.887925] which lock already depends on the new lock.
[ 153.887925]
[ 153.887926]
[ 153.887926] the existing dependency chain (in reverse order) is:
[ 153.887927]
[ 153.887927] -> #1 (jbd2_handle){++++..}:
[ 153.887931] [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[ 153.887932] [<ffffffff812f4fed>] add_transaction_credits+0x4d/0x310
[ 153.887933] [<ffffffff812f5409>] start_this_handle+0x109/0x470
[ 153.887933] [<ffffffff812f5d1b>] jbd2__journal_start+0xdb/0x380
[ 153.887936] [<ffffffff812d96a0>] __ext4_journal_start_sb+0x90/0x2b0
[ 153.887940] [<ffffffff8129d10e>] ext4_file_open+0x11e/0x230
[ 153.887946] [<ffffffff8120c808>] do_dentry_open.isra.16+0x168/0x310
[ 153.887948] [<ffffffff8120dbe5>] vfs_open+0x45/0x70
[ 153.887950] [<ffffffff8121f30c>] path_openat+0x48c/0xa20
[ 153.887951] [<ffffffff81220b9e>] do_filp_open+0x7e/0xe0
[ 153.887953] [<ffffffff81215164>] do_open_execat+0x64/0x150
[ 153.887954] [<ffffffff812174a8>] do_execveat_common.isra.43+0x278/0x950
[ 153.887955] [<ffffffff81217f30>] compat_SyS_execve+0x40/0x50
[ 153.887956] [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[ 153.887960] [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60
[ 153.887960]
[ 153.887960] -> #0 (&journal->j_state_lock){+.+...}:
[ 153.887962] [<ffffffff810cf54c>] __lock_acquire+0x105c/0x11d0
[ 153.887963] [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[ 153.887966] [<ffffffff818e5e24>] rt_write_lock+0x34/0x50
[ 153.887968] [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[ 153.887968] [<ffffffff812f67a7>] jbd2_journal_stop+0x147/0x640
[ 153.887969] [<ffffffff812d98fd>] __ext4_journal_stop+0x3d/0xa0
[ 153.887972] [<ffffffff812ad72c>] ext4_dirty_inode+0x5c/0x70
[ 153.887975] [<ffffffff81243b4a>] __mark_inode_dirty+0x29a/0x9d0
[ 153.887984] [<ffffffff8122dfb6>] generic_update_time+0x76/0xc0
[ 153.887986] [<ffffffff81230058>] touch_atime+0x88/0xa0
[ 153.887987] [<ffffffff812248c0>] iterate_dir+0x170/0x190
[ 153.887990] [<ffffffff812719b6>] compat_SyS_getdents64+0x86/0xf0
[ 153.887991] [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[ 153.887992] [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60
[ 153.887992]
[ 153.887992] other info that might help us debug this:
[ 153.887992]
[ 153.887992] Possible unsafe locking scenario:
[ 153.887992]
[ 153.887993] CPU0 CPU1
[ 153.887993] ---- ----
[ 153.887993] lock(jbd2_handle);
[ 153.887994] lock(&journal->j_state_lock);
[ 153.887994] lock(jbd2_handle);
[ 153.887995] lock(&journal->j_state_lock);
[ 153.887995]
[ 153.887995] *** DEADLOCK ***
[ 153.887995]
[ 153.887996] 4 locks held by find/2537:
[ 153.888004] #0: (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812323aa>] __fdget_pos+0x4a/0x50
[ 153.888007] #1: (&type->i_mutex_dir_key){++++++}, at: [<ffffffff810d4650>] rt_down_read+0x10/0x20
[ 153.888008] #2: (sb_writers#3){.+.+.+}, at: [<ffffffff8121272a>] __sb_start_write+0xda/0xf0
[ 153.888010] #3: (jbd2_handle){++++..}, at: [<ffffffff812f5411>] start_this_handle+0x111/0x470
[ 153.888010]
[ 153.888010] stack backtrace:
[ 153.888011] CPU: 2 PID: 2537 Comm: find Tainted: G W 4.8.0-rc6-rt0+ #679
[ 153.888012] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
[ 153.888014] 0000000000000000 ffff88003826fb20 ffffffff81438d04 ffffffff82a054c0
[ 153.888015] ffffffff82a054c0 ffff88003826fb60 ffffffff810ccd3e 0000000038d8db70
[ 153.888016] ffff880038d8dba8 ffff880038d8db70 5b9d99a44104f32e ffff880038d8d200
[ 153.888016] Call Trace:
[ 153.888019] [<ffffffff81438d04>] dump_stack+0x86/0xc2
[ 153.888021] [<ffffffff810ccd3e>] print_circular_bug+0x1be/0x210
[ 153.888022] [<ffffffff810cf54c>] __lock_acquire+0x105c/0x11d0
[ 153.888024] [<ffffffff810ca83e>] ? put_lock_stats.isra.28+0xe/0x50
[ 153.888025] [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[ 153.888026] [<ffffffff81300ed0>] ? jbd2_log_start_commit+0x20/0x40
[ 153.888028] [<ffffffff818e5e24>] rt_write_lock+0x34/0x50
[ 153.888029] [<ffffffff81300ed0>] ? jbd2_log_start_commit+0x20/0x40
[ 153.888030] [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[ 153.888031] [<ffffffff812f67a7>] jbd2_journal_stop+0x147/0x640
[ 153.888032] [<ffffffff812a7bbf>] ? ext4_mark_iloc_dirty+0x4df/0x7d0
[ 153.888033] [<ffffffff812d9c49>] ? __ext4_journal_get_write_access+0x49/0x90
[ 153.888034] [<ffffffff812ad718>] ? ext4_dirty_inode+0x48/0x70
[ 153.888035] [<ffffffff812d98fd>] __ext4_journal_stop+0x3d/0xa0
[ 153.888036] [<ffffffff812ad72c>] ext4_dirty_inode+0x5c/0x70
[ 153.888037] [<ffffffff81243b4a>] __mark_inode_dirty+0x29a/0x9d0
[ 153.888038] [<ffffffff8122dfb6>] generic_update_time+0x76/0xc0
[ 153.888039] [<ffffffff81230058>] touch_atime+0x88/0xa0
[ 153.888040] [<ffffffff812248c0>] iterate_dir+0x170/0x190
[ 153.888041] [<ffffffff812719b6>] compat_SyS_getdents64+0x86/0xf0
[ 153.888042] [<ffffffff8126ff80>] ? C_SYSC_newstat+0x30/0x30
[ 153.888046] [<ffffffff81181523>] ? __context_tracking_exit.part.5+0x33/0x1c0
[ 153.888048] [<ffffffff810ce100>] ? trace_hardirqs_on_caller+0x100/0x1c0
[ 153.888049] [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[ 153.888050] [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60

The only change in the ext4/jbd2 area on RT is the conversion of the bit
spinlocks in the buffer head to real spinlocks, but that is not involved
here.

So this looks like a genuine issue which might be easier to expose with RT.

Thanks,

tglx