Re: [REVIEW][PATCH] proc/sysctl: Don't grab i_lock under sysctl_lock.

From: Eric W. Biederman
Date: Tue Feb 21 2017 - 14:34:41 EST


Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:

> On 21.02.2017 04:41, Eric W. Biederman wrote:
>>
>> Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:
>>> This patch has locking problem. I've got lockdep splat under LTP.
>>>
>>> [ 6633.115456] ======================================================
>>> [ 6633.115502] [ INFO: possible circular locking dependency detected ]
>>> [ 6633.115553] 4.9.10-debug+ #9 Tainted: G L
>>> [ 6633.115584] -------------------------------------------------------
>>> [ 6633.115627] ksm02/284980 is trying to acquire lock:
>>> [ 6633.115659] (&sb->s_type->i_lock_key#4){+.+...}, at: [<ffffffff816bc1ce>] igrab+0x1e/0x80
>>> [ 6633.115834] but task is already holding lock:
>>> [ 6633.115882] (sysctl_lock){+.+...}, at: [<ffffffff817e379b>] unregister_sysctl_table+0x6b/0x110
>>> [ 6633.116026] which lock already depends on the new lock.
>>> [ 6633.116026]
>>> [ 6633.116080]
>>> [ 6633.116080] the existing dependency chain (in reverse order) is:
>>> [ 6633.116117]
>>> -> #2 (sysctl_lock){+.+...}:
>>> -> #1 (&(&dentry->d_lockref.lock)->rlock){+.+...}:
>>> -> #0 (&sb->s_type->i_lock_key#4){+.+...}:
>>>
>>> d_lock nests inside i_lock
>>> sysctl_lock nests inside d_lock in d_compare
>>>
>>> This patch adds i_lock nesting inside sysctl_lock.
>>
>> Al Viro <viro@xxxxxxxxxxxxxxxxxx> replied:
>>> Once ->unregistering is set, you can drop sysctl_lock just fine. So I'd
>>> try something like this - use rcu_read_lock() in proc_sys_prune_dcache(),
>>> drop sysctl_lock() before it and regain after. Make sure that no inodes
>>> are added to the list ones ->unregistering has been set and use RCU list
>>> primitives for modifying the inode list, with sysctl_lock still used to
>>> serialize its modifications.
>>>
>>> Freeing struct inode is RCU-delayed (see proc_destroy_inode()), so doing
>>> igrab() is safe there. Since we don't drop inode reference until after we'd
>>> passed beyond it in the list, list_for_each_entry_rcu() should be fine.
>>
>> I agree with Al Viro's analsysis of the situtation.
>>
>> Fixes: 802e348c6b77 ("proc/sysctl: prune stale dentries during unregistering")
>> Reported-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx>
>> Suggested-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> ---
>>
>> This is my cleaned up version of Al Viro's proposed fix.
>> I have tested it and the lockdep warnings go away, and
>> I have fixed a few trivial to ensure things work as intended.
>>
>> Unless someone sees a problem I am going to add this fix to my tree and
>> then send a pull request to Linus.
>
> I've tested the same patch and found no problems.
>
> Except proc_sys_prune_dcache() is no longer called under sysctl_lock
> like says comment above it.

Thank you. I will add your Tested-by line to the patch.

Eric