Re: [PATCH] f2fs: disble physical prealloc in LSF mount

From: Javier GonzÃlez
Date: Mon Nov 25 2019 - 22:57:38 EST


On 26.11.2019 02:06, Damien Le Moal wrote:
On 2019/11/26 4:03, Javier GonzÃlez wrote:
On 25.11.2019 00:48, Damien Le Moal wrote:
On 2019/11/22 18:00, Javier GonzÃlez wrote:
From: Javier GonzÃlez <javier.gonz@xxxxxxxxxxx>

Fix file system corruption when using LFS mount (e.g., in zoned
devices). Seems like the fallback into buffered I/O creates an
inconsistency if the application is assuming both read and write DIO. I
can easily reproduce a corruption with a simple RocksDB test.

Might be that the f2fs_forced_buffered_io path brings some problems too,
but I have not seen other failures besides this one.

Problem reproducible without a zoned block device, simply by forcing
LFS mount:

$ sudo mkfs.f2fs -f -m /dev/nvme0n1
$ sudo mount /dev/nvme0n1 /mnt/f2fs
$ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0
--use_direct_reads=true --use_direct_io_for_flush_and_compaction=true
--db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1
--block_size=65536

Note that the options that cause the problem are:
--use_direct_reads=true --use_direct_io_for_flush_and_compaction=true

Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write")

Signed-off-by: Javier GonzÃlez <javier.gonz@xxxxxxxxxxx>
---
fs/f2fs/data.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5755e897a5f0..b045dd6ab632 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
return err;
}

- if (direct_io && allow_outplace_dio(inode, iocb, from))
- return 0;

Since for LFS mode, all DIOs can end up out of place, I think that it
may be better to change allow_outplace_dio() to always return true in
the case of LFS mode. So may be something like:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return test_opt(sbi, LFS) ||
(rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
}

instead of the original:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return (test_opt(sbi, LFS) && (rw == WRITE) &&
!block_unaligned_IO(inode, iocb, iter));
}

Thoughts ?


I see what you mean and it makes sense. However, the problem I am seeing
occurs when allow_outplace_dio() returns true, as this is what creates
the inconsistency between the write being buffered and the read being
DIO.

But if the write is switched to buffered, the DIO read should use the
buffered path too, no ? Since this is all happening under VFS, the
generic DIO read path will not ensure that the buffered writes are
flushed to disk before issuing the direct read, I think. So that would
explain your data corruption, i.e. you are reading stale data on the
device before the buffered writes make it to the media.


As far as I can see, the read is always sent DIO, so yes, I also believe
that we are reading stale data. This is why the corruption is not seen
if preventing allow_outplace_dio() from sending the write to the
buffered path.

What surprises me is that this is very easy to trigger (see commit), so
I assume you must have seen this with SMR in the past.

Does it make sense to leave the LFS check out of the
allow_outplace_dio()? Or in other words, is there a hard requirement for
writes to take this path on a zoned device that I am not seeing?
Something like:

static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);

return (rw == WRITE && !block_unaligned_IO(inode, iocb, iter));
}

Thanks,
Javier