Re: [PATCH] f2fs: avoid race between zero_range and background GC

From: Chao Yu
Date: Sat Jul 28 2018 - 23:03:13 EST


On 2018/7/29 10:59, Jaegeuk Kim wrote:
> On 07/29, Chao Yu wrote:
>> On 2018/7/29 10:02, Jaegeuk Kim wrote:
>>> On 07/27, Chao Yu wrote:
>>>> On 2018/7/27 18:29, Jaegeuk Kim wrote:
>>>>> On 07/26, Chao Yu wrote:
>>>>>> Thread A Background GC
>>>>>> - f2fs_zero_range
>>>>>> - truncate_pagecache_range
>>>>>> - gc_data_segment
>>>>>> - get_read_data_page
>>>>>> - move_data_page
>>>>>> - set_page_dirty
>>>>>> - set_cold_data
>>>>>> - f2fs_do_zero_range
>>>>>> - dn->data_blkaddr = NEW_ADDR;
>>>>>> - f2fs_set_data_blkaddr
>>>>>>
>>>>>> Actually, we don't need to set dirty & checked flag on the page, since
>>>>>> all valid data in the page should be zeroed by zero_range().
>>>>>
>>>>> But, it doesn't matter too much, right?
>>>>
>>>> No, if the dirtied page is writebacked after f2fs_do_zero_range(), result of
>>>> zero_range() should be wrong, as zeroed page contains valid user data.
>>>
>>> How about truncating page caches after block address change or doing it twice
>>> before and after?
>>
>> Thread A Background GC
>> - f2fs_zero_range
>> - truncate_pagecache_range
>> - gc_data_segment
>> - get_read_data_page
>> - move_data_page
>> - set_page_dirty
>> - set_cold_data
>> - f2fs_do_zero_range
>> - dn->data_blkaddr = NEW_ADDR;
>> - f2fs_set_data_blkaddr
>> bdi-flusher
>> - __write_data_page
>> - f2fs_update_data_blkaddr
>> : data_blkaddr has been updated here.
>> - truncate_pagecache_range
>> : data & dnode has been writebacked before page cache truncation?
>>
>> How about this case?
>
> So, truncating pages under dnode lock can address it?

Normally, our lock dependency is

->writepage()
lock data page -> lock dnode page

here
lock dnode page -> truncate_pagecache_range::lock data page

Will easily cause deadlock?

Thanks,

>
>>
>> Thanks,
>>
>>>
>>>>
>>>>>
>>>>>> Use i_gc_rwsem[WRITE] to avoid such race condition.
>>>>>
>>>>> Hope to avoid abusing i_gc_rwsem[] tho.
>>>>
>>>> Agreed, let's try avoiding until we have to use it.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx>
>>>>>> ---
>>>>>> fs/f2fs/file.c | 2 ++
>>>>>> 1 file changed, 2 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>> index 267ec3794e1e..7bd2412a8c37 100644
>>>>>> --- a/fs/f2fs/file.c
>>>>>> +++ b/fs/f2fs/file.c
>>>>>> @@ -1309,6 +1309,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
>>>>>> if (ret)
>>>>>> return ret;
>>>>>>
>>>>>> + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>>>>> down_write(&F2FS_I(inode)->i_mmap_sem);
>>>>>> ret = filemap_write_and_wait_range(mapping, offset, offset + len - 1);
>>>>>> if (ret)
>>>>>> @@ -1389,6 +1390,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
>>>>>> }
>>>>>> out_sem:
>>>>>> up_write(&F2FS_I(inode)->i_mmap_sem);
>>>>>> + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>>>>>
>>>>>> return ret;
>>>>>> }
>>>>>> --
>>>>>> 2.18.0.rc1