Re: [BUG 3.13.0-rc6] reiserfs possible circular locking dependency
From: Jeff Mahoney
Date: Wed Jan 15 2014 - 18:32:37 EST
On 1/3/14, 5:04 PM, Jeff Mahoney wrote:
> On 1/3/14, 2:46 PM, Linus Torvalds wrote:
>> On Fri, Jan 3, 2014 at 11:16 AM, Knut Petersen
>> <Knut_Petersen@xxxxxxxxxxx> wrote:
>>> Rebooting after a power failure on an openSuSE 13.1 system
>>> with kernel 3.13.0-rc6 triggered the attached lockdep warning.
>>
>> Hmm. It seems to be that the *normal* sequence should be:
>>
>> - get i_mutex, call lookup, which gets sbi->lock (reiserfs_write_lock)
>>
>> but in the mounting path, we have special circumstances.
>>
>> That finish_unfinished() function does
>>
>> - reiserfs_write_lock_nested() .
>> - remove_save_link
>> - iput(inode) with the write lock held
>>
>> and that can apparently end up taking i_mutex in open_xa_dir (and then
>> recursively the write lock, but that's an explicitly recursive lock,
>> so that part should be ok).
>>
>> Now, I don't think this can *really* deadlock with the normal order of
>> operations, because during mounting there is no other process that can
>> take those in the reverse order (since the filesystem isn't live), but
>> I do wonder if we should just release the reiserfs write lock over the
>> iputs. We release it in other parts anyway (like for the quota off)
>>
>> Jeff, you already touched this exact case in commit d2d0395fd177
>> ("reiserfs: locking, release lock around quota operations") except
>> that was for those quota operation cases.
>>
>> Even if it's not a real problem, making lockdep happy sounds like a
>> good idea. Of course, the trouble is that this code path almost never
>> gets exercised (which is why this hasn't been noticed earlier), so
>> testing...
>>
>> Jeff? Comments?
>
> If someone ever invents a time machine, I'd go back to 2004 and tell
> myself to fight harder to make a reiserfs v3.7 with real extended
> attribute items. This code will haunt me to my death.
>
> Anyway, yeah. The right thing here is to drop the lock for the iput.
> More than that would be ok too. finish_unfinished happens when the file
> system goes read-write and that includes the remount path. There can be
> other users of the file system but it would be a recursive acquire so we
> wouldn't actually deadlock there.
>
> I'll work something up over the weekend or on Monday.
As a quick update here, I do have patches to fix this particular issue
but it's tough to depend on xfstests to detect regressions when xfstests
causes other lockdep issues. I'm taking this an an opportunity to clean
up the locking enough to pass xfstests.
-Jeff
--
Jeff Mahoney
SUSE Labs
Attachment:
signature.asc
Description: OpenPGP digital signature