Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive
From: Joel Fernandes
Date: Wed Feb 07 2018 - 17:27:46 EST
Hi Peter,
On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
[...]
>
>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>> this path and acquire the lock. So lockdep seems to be doing the right
>> thing however by design it is reporting a false-positive.
>
> So I'm not seeing how its a false positive. fs/inode.c sets a different
> lock class per filesystem type. So recursing on an i_mutex within a
> filesystem does sound dodgy.
But directory inodes and file inodes in the same filesystem share the
same lock class right? All the issues I've seen (both our's and
Neil's) are similar in that a directory inode's lock is held followed
by a RECLAIM_FS allocation, and in parallel to that, memory reclaim
involving the same FS is going on in another thread. In the splat I
shared, during the VFS lookup- the d_alloc is called with an inode's
lock held (I am guessing this the lock of the directory inode which is
locked just before the d_alloc), and in parallel (kswapd or some other
thread) is doing memory reclaim.
>> The real issue is that the lock being acquired is of the same lock
>> class and a different lock instance is acquired under GFP_FS that
>> happens to be of the same class.
>>
>> So the issue seems to me to be:
>> Process A kswapd
>> --------- ------
>> acquire i_mutex Enter RECLAIM_FS
>>
>> Enter RECLAIM_FS acquire different i_mutex
>
> That's not a false positive, that's a 2 process way of writing i_mutex
> recursion.
Yes, but I mention false positive since the kswapd->ashmem_shrink_scan
path can never acquire the mutex of a directory inode AFAIK. So from
that perspective it seems a false-positive.
>
> What are the rules of acquiring two i_mutexes within a filesystem?
>
I am not fully sure. I am CC'ing Ted and linux-fs-devel as well for
any input on this question.
>> Neil tried to fix this sometime back:
>> https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg623909.html
>> but it was kind of NAK'ed.
>
> So that got nacked because Neil tried to fix it in the vfs core. Also
> not entirely sure that's the same problem.
Yes, a similar fix was proposed internally here, I would say the
signature of the problem reported there is quite similar (its just
that there its nfsd mentioned as doing the reclaim instead of kswapd).
thanks,
- Joel
[1] https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg623986.html