Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint
From: Chao Yu
Date: Fri May 08 2020 - 23:02:40 EST
On 2020/5/9 0:10, Jaegeuk Kim wrote:
> Hi Sayali,
>
> In order to address the perf regression, how about this?
>
>>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> Date: Fri, 8 May 2020 09:08:37 -0700
> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
>
> Call stack :
>
> Thread A Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
> - f2fs_lock_all(sbi);
> - down_write(&sbi->cp_rwsem);
>
> - open()
> - igrab()
> - write() write inline data
> - unlink()
> - f2fs_sync_node_pages()
> - if (is_inline_node(page))
> - flush_inline_data()
> - ilookup()
> page = f2fs_pagecache_get_page()
> if (!page)
> goto iput_out;
> iput_out:
> -close()
> -iput()
> iput(inode);
> - f2fs_evict_inode()
> - f2fs_truncate_blocks()
> - f2fs_lock_op()
> - down_read(&sbi->cp_rwsem);
>
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <sayalil@xxxxxxxxxxxxxx>
> Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
> ---
> fs/f2fs/node.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..626d7daca09de 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> goto continue_unlock;
> }
>
> - /* flush inline_data */
> - if (is_inline_node(page)) {
> + /* flush inline_data, if it's not sync path. */
> + if (do_balance && is_inline_node(page)) {
IIRC, this flow was designed to avoid running out of free space issue
during checkpoint:
2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly
I guess this may cause failing one case of fstest.
Since block_operations->f2fs_sync_inode_meta has synced inode cache to
inode page, so in block_operations->f2fs_sync_node_pages, could we
check nlink before flush_inline_data():
if (is_inline_node(page)) {
if (IS_INODE(page) && raw_inode_page->i_links) {
flush_inline_data()
}
}
> clear_inline_node(page);
> unlock_page(page);
> flush_inline_data(sbi, ino_of_node(page));
>