On 07/02, Chao Yu wrote:
On 2021/7/2 1:10, Jaegeuk Kim wrote:
On 06/01, Chao Yu wrote:
[1] https://www.mail-archive.com/linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx/msg15126.html
As [1] reported, if lower device doesn't support write barrier, in below
case:
- write page #0; persist
- overwrite page #0
- fsync
- write data page #0 OPU into device's cache
- write inode page into device's cache
- issue flush
Well, we have preflush for node writes, so I don't think this is the case.
fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
This is only used for atomic write case, right?
I mean the common case which is called from f2fs_issue_flush() in
f2fs_do_sync_file().
How about adding PREFLUSH when writing node blocks aligned to the above set?
And please see do_checkpoint(), we call f2fs_flush_device_cache() and
commit_checkpoint() separately to keep persistence order of CP datas.
See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first")
for details.
Thanks,
If SPO is triggered during flush command, inode page can be persisted
before data page #0, so that after recovery, inode page can be recovered
with new physical block address of data page #0, however there may
contains dummy data in new physical block address.
Then what user will see is: after overwrite & fsync + SPO, old data in
file was corrupted, if any user do care about such case, we can suggest
user to use STRICT fsync mode, in this mode, we will force to trigger
preflush command to persist data in device cache in prior to node
writeback, it avoids potential data corruption during fsync().
Signed-off-by: Chao Yu <yuchao0@xxxxxxxxxx>
---
v2:
- fix this by adding additional preflush command rather than using
atomic write flow.
fs/f2fs/file.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7d5311d54f63..238ca2a733ac 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
f2fs_exist_written_data(sbi, ino, UPDATE_INO))
goto flush_out;
goto out;
+ } else {
+ /*
+ * for OPU case, during fsync(), node can be persisted before
+ * data when lower device doesn't support write barrier, result
+ * in data corruption after SPO.
+ * So for strict fsync mode, force to trigger preflush to keep
+ * data/node write order to avoid potential data corruption.
+ */
+ if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
+ !atomic) {
+ ret = f2fs_issue_flush(sbi, inode->i_ino);
+ if (ret)
+ goto out;
+ }
}
go_write:
/*
--
2.29.2