Re: [PATCH] f2fs: fix to wait IO writeback in __revoke_inmem_pages()

From: Chao Yu
Date: Tue May 08 2018 - 02:06:24 EST


On 2018/5/8 11:31, Jaegeuk Kim wrote:
> On 05/08, Chao Yu wrote:
>> On 2018/5/8 4:46, Jaegeuk Kim wrote:
>>> On 04/27, Chao Yu wrote:
>>>> On 2018/4/27 0:36, Jaegeuk Kim wrote:
>>>>> On 04/26, Chao Yu wrote:
>>>>>> On 2018/4/26 23:48, Jaegeuk Kim wrote:
>>>>>>> On 04/26, Chao Yu wrote:
>>>>>>>> Thread A Thread B
>>>>>>>> - f2fs_ioc_commit_atomic_write
>>>>>>>> - commit_inmem_pages
>>>>>>>> - f2fs_submit_merged_write_cond
>>>>>>>> : write data
>>>>>>>> - write_checkpoint
>>>>>>>> - do_checkpoint
>>>>>>>> : commit all node within CP
>>>>>>>> -> SPO
>>>>>>>> - f2fs_do_sync_file
>>>>>>>> - file_write_and_wait_range
>>>>>>>> : wait data writeback
>>>>>>>>
>>>>>>>> In above race condition, data/node can be flushed in reversed order when
>>>>>>>> coming a checkpoint before f2fs_do_sync_file, after SPOR, it results in
>>>>>>>> atomic written data being corrupted.
>>>>>>>
>>>>>>> Wait, what is the problem here? Thread B could succeed checkpoint, there is
>>>>>>> no problem. If it fails, there is no fsync mark where we can recover it, so
>>>>>>
>>>>>> Node is flushed by checkpoint before data, with reversed order, that's the problem.
>>>>>
>>>>> What do you mean? Data should be in disk, in order to proceed checkpoint.
>>>>
>>>> 1. thread A: commit_inmem_pages submit data into block layer, but haven't waited
>>>> it writeback.
>>>> 2. thread A: commit_inmem_pages update related node.
>>>> 3. thread B: do checkpoint, flush all nodes to disk
>>>
>>> How about, in block_operations(),
>>>
>>> down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>> if (fail)
>>> wait_on_all_pages_writeback(F2FS_WB_DATA);
>>> else
>>> up_read(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
>>
>> I sent one patch for that, could you check it?
>>
>> Adding wait_on_all_pages_writeback in block_operations() can make checkpoint()
>> wait pages writeback one more time, which break IO flow, so what's your concern
>> here?
>
> Performance. And I can see wait_on_all_pages_writeback() waits only for
> F2FS_WB_CP_DATA in checkpoint()?

Oh, you mean wait all F2FS_WB_DATA pages writeback, what about just treating
atomic write page as F2FS_WB_CP_DATA, and we can wait atomic pages in
wait_on_all_pages_writeback() in do_checkpoitn().

Thanks,

>
>
>>
>> Thanks,
>>
>>>
>>>
>>>> 4. SPOR
>>>>
>>>> Then, atomic file becomes corrupted since nodes is flushed before data.
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> we can just ignore the last written data as nothing.
>>>>>>>
>>>>>>>>
>>>>>>>> This patch adds f2fs_wait_on_page_writeback in __revoke_inmem_pages() to
>>>>>>>> keep data and node of atomic file being flushed orderly.
>>>>>>>>
>>>>>>>> Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> fs/f2fs/file.c | 4 ++++
>>>>>>>> fs/f2fs/segment.c | 3 +++
>>>>>>>> 2 files changed, 7 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>>>>> index be7578774a47..a352804af244 100644
>>>>>>>> --- a/fs/f2fs/file.c
>>>>>>>> +++ b/fs/f2fs/file.c
>>>>>>>> @@ -217,6 +217,9 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>>
>>>>>>>> trace_f2fs_sync_file_enter(inode);
>>>>>>>>
>>>>>>>> + if (atomic)
>>>>>>>> + goto write_done;
>>>>>>>> +
>>>>>>>> /* if fdatasync is triggered, let's do in-place-update */
>>>>>>>> if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
>>>>>>>> set_inode_flag(inode, FI_NEED_IPU);
>>>>>>>> @@ -228,6 +231,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +write_done:
>>>>>>>> /* if the inode is dirty, let's recover all the time */
>>>>>>>> if (!f2fs_skip_inode_update(inode, datasync)) {
>>>>>>>> f2fs_write_inode(inode, NULL);
>>>>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>>>>> index 584483426584..9ca3d0a43d93 100644
>>>>>>>> --- a/fs/f2fs/segment.c
>>>>>>>> +++ b/fs/f2fs/segment.c
>>>>>>>> @@ -230,6 +230,8 @@ static int __revoke_inmem_pages(struct inode *inode,
>>>>>>>>
>>>>>>>> lock_page(page);
>>>>>>>>
>>>>>>>> + f2fs_wait_on_page_writeback(page, DATA, true);
>>>>>>>> +
>>>>>>>> if (recover) {
>>>>>>>> struct dnode_of_data dn;
>>>>>>>> struct node_info ni;
>>>>>>>> @@ -415,6 +417,7 @@ static int __commit_inmem_pages(struct inode *inode)
>>>>>>>> /* drop all uncommitted pages */
>>>>>>>> __revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
>>>>>>>> } else {
>>>>>>>> + /* wait all committed IOs writeback and release them from list */
>>>>>>>> __revoke_inmem_pages(inode, &revoke_list, false, false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.15.0.55.gc2ece9dc4de6
>>>>>
>>>>> .
>>>>>
>>>
>>> .
>>>
>
> .
>