Re: [PATCH] fs: Kill DCACHE_DONTCACHE dentry even if DCACHE_REFERENCED is set

From: Li, Hao
Date: Fri Aug 28 2020 - 05:04:30 EST


On 2020/8/28 8:35, Dave Chinner wrote:
> On Thu, Aug 27, 2020 at 05:58:07PM +0800, Li, Hao wrote:
>> On 2020/8/27 14:37, Dave Chinner wrote:
>>> On Fri, Aug 21, 2020 at 09:59:53AM +0800, Hao Li wrote:
>>>> Currently, DCACHE_REFERENCED prevents the dentry with DCACHE_DONTCACHE
>>>> set from being killed, so the corresponding inode can't be evicted. If
>>>> the DAX policy of an inode is changed, we can't make policy changing
>>>> take effects unless dropping caches manually.
>>>>
>>>> This patch fixes this problem and flushes the inode to disk to prepare
>>>> for evicting it.
>>>>
>>>> Signed-off-by: Hao Li <lihao2018.fnst@xxxxxxxxxxxxxx>
>>>> ---
>>>> fs/dcache.c | 3 ++-
>>>> fs/inode.c | 2 +-
>>>> 2 files changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/dcache.c b/fs/dcache.c
>>>> index ea0485861d93..486c7409dc82 100644
>>>> --- a/fs/dcache.c
>>>> +++ b/fs/dcache.c
>>>> @@ -796,7 +796,8 @@ static inline bool fast_dput(struct dentry *dentry)
>>>> */
>>>> smp_rmb();
>>>> d_flags = READ_ONCE(dentry->d_flags);
>>>> - d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED;
>>>> + d_flags &= DCACHE_REFERENCED | DCACHE_LRU_LIST | DCACHE_DISCONNECTED
>>>> + | DCACHE_DONTCACHE;
>>> Seems reasonable, but you need to update the comment above as to
>>> how this flag fits into this code....
>> Yes. I will change it. Thanks.
>>
>>>> /* Nothing to do? Dropping the reference was all we needed? */
>>>> if (d_flags == (DCACHE_REFERENCED | DCACHE_LRU_LIST) && !d_unhashed(dentry))
>>>> diff --git a/fs/inode.c b/fs/inode.c
>>>> index 72c4c347afb7..5218a8aebd7f 100644
>>>> --- a/fs/inode.c
>>>> +++ b/fs/inode.c
>>>> @@ -1632,7 +1632,7 @@ static void iput_final(struct inode *inode)
>>>> }
>>>>
>>>> state = inode->i_state;
>>>> - if (!drop) {
>>>> + if (!drop || (drop && (inode->i_state & I_DONTCACHE))) {
>>>> WRITE_ONCE(inode->i_state, state | I_WILL_FREE);
>>>> spin_unlock(&inode->i_lock);
>>> What's this supposed to do? We'll only get here with drop set if the
>>> filesystem is mounting or unmounting.
>> The variable drop will also be set to True if I_DONTCACHE is set on
>> inode->i_state.
>> Although mounting/unmounting will set the drop variable, it won't set
>> I_DONTCACHE if I understand correctly. As a result,
>> drop && (inode->i_state & I_DONTCACHE) will filter out mounting/unmounting.
> So what does this have to do with changing the way the dcache
> treats DCACHE_DONTCACHE?
After changing the way the dcache treats DCACHE_DONTCACHE, the dentry with
DCACHE_DONTCACHE set will be killed unconditionally even if
DCACHE_REFERENCED is set, and its inode will be processed by iput_final().
This inode has I_DONTCACHE flag, so op->drop_inode() will return true,
and the inode will be evicted _directly_ even though it has dirty pages.
I think the kernel will run into error state because it doesn't writeback
dirty pages of this inode before evicting. This is why I write back this
inode here.

According to my test, if I revert the second hunk of this patch, kernel
will hang after running the following command:
echo 123 > test.txt && xfs_io -c "chattr +x" test.txt

The backtrace is:

xfs_fs_destroy_inode+0x204/0x24d
destroy_inode+0x3b/0x65
evict+0x150/0x1aa
iput+0x117/0x19a
dentry_unlink_inode+0x12b/0x12f
__dentry_kill+0xee/0x211
dentry_kill+0x112/0x22f
dput+0x79/0x86
__fput+0x200/0x23f
____fput+0x25/0x28
task_work_run+0x144/0x177
do_exit+0x4fb/0xb94

This backtrace is printed with an ASSERT(0) statement in xfs_fs_destroy_inode().
>
> Also, if I_DONTCACHE is set, but the inode has also been unlinked or
> is unhashed, should we be writing it back? i.e. it might have been
> dropped for a different reason to I_DONTCACHE....
This is a problem I didn't realize. You are right. If the inode has been
unlinked/unhashed and the I_DONTCACHE is also set, the appended condition
will lead to unnecessary writeback.

I think I need to handle the inode writeback more carefully. Maybe I can
revert the second hunk and remove I_DONTCACHE from generic_drop_inode()
and then change

if (!drop && (sb->s_flags & SB_ACTIVE))

to

if (!drop && !(inode->i_state & I_DONTCACHE) && (sb->s_flags & SB_ACTIVE))

This approach would be more suitable.
>
> IOWs, if there is a problem with how I_DONTCACHE is being handled,
> then the problem must already exists regardless of the
> DCACHE_DONTCACHE behaviour, right? So shouldn't this be a separate
> bug fix with it's own explanation of the problem and the fix?
I do think the way we treat I_DONTCACHE in current kernel is not suitable.
generic_drop_inode() is used to determine if the inode should be evicted
without writeback, so I_DONTCACHE shouldn't be handled in this function.
The suitable approach is illustrated above. I can submit a patch to fix
this problem if this new approach is acceptable.

Thanks,
Hao Li
> Cheers,
>
> Dave.