On Wed, Jan 24, 2024 at 02:26:40PM +0000, John Garry wrote:
Add flag IOMAP_ATOMIC_WRITE to indicate to the FS that an atomic write
bio is being created and all the rules there need to be followed.
It is the task of the FS iomap iter callbacks to ensure that the mapping
created adheres to those rules, like size is power-of-2, is at a
naturally-aligned offset, etc. However, checking for a single iovec, i.e.
iter type is ubuf, is done in __iomap_dio_rw().
A write should only produce a single bio, so error when it doesn't.
Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
---
fs/iomap/direct-io.c | 21 ++++++++++++++++++++-
fs/iomap/trace.h | 3 ++-
include/linux/iomap.h | 1 +
3 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..25736d01b857 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -275,10 +275,12 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
struct iomap_dio *dio)
{
+ bool atomic_write = iter->flags & IOMAP_ATOMIC;
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
loff_t length = iomap_length(iter);
+ const size_t iter_len = iter->len;
loff_t pos = iter->pos;
blk_opf_t bio_opf;
struct bio *bio;
@@ -381,6 +383,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
GFP_KERNEL);
bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
bio->bi_ioprio = dio->iocb->ki_ioprio;
+ if (atomic_write)
+ bio->bi_opf |= REQ_ATOMIC;
This really ought to be in iomap_dio_bio_opflags. Unless you can't pass
REQ_ATOMIC to bio_alloc*, in which case there ought to be a comment
about why.
Also, what's the meaning of REQ_OP_READ | REQ_ATOMIC?
Does that
actually work? I don't know what that means, and "block: Add REQ_ATOMIC
flag" says that's not a valid combination. I'll complain about this
more below.
+
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
@@ -397,6 +402,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
}
n = bio->bi_iter.bi_size;
+ if (atomic_write && n != iter_len) {
s/iter_len/orig_len/ ?
+ /* This bio should have covered the complete length */
+ ret = -EINVAL;
+ bio_put(bio);
+ goto out;
+ }
if (dio->flags & IOMAP_DIO_WRITE) {
task_io_account_write(n);
} else {
@@ -554,12 +565,17 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
struct blk_plug plug;
struct iomap_dio *dio;
loff_t ret = 0;
+ bool is_read = iov_iter_rw(iter) == READ;
+ bool atomic_write = (iocb->ki_flags & IOCB_ATOMIC) && !is_read;
Hrmm. So if the caller passes in an IOCB_ATOMIC iocb with a READ iter,
we'll silently drop IOCB_ATOMIC and do the read anyway? That seems like
a nonsense combination, but is that ok for some reason?
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
if (!iomi.len)
return NULL;
+ if (atomic_write && !iter_is_ubuf(iter))
+ return ERR_PTR(-EINVAL);
Does !iter_is_ubuf actually happen?
Why don't we support any of theIt's related how we can determine atomic_write_unit_max for the bdev.
other ITER_ types? Is it because hardware doesn't want vectored
buffers?
I really wish there was more commenting on /why/ we do things here:
if (iocb->ki_flags & IOCB_ATOMIC) {
/* atomic reads do not make sense */
if (iov_iter_rw(iter) == READ)
return ERR_PTR(-EINVAL);
/*
* block layer doesn't want to handle handle vectors of
* buffers when performing an atomic write i guess?
*/
if (!iter_is_ubuf(iter))
return ERR_PTR(-EINVAL);
iomi.flags |= IOMAP_ATOMIC;
}