Re: [PATCH v3 17/21] iomap: Atomic write support

From: Dave Chinner
Date: Tue Apr 30 2024 - 21:47:56 EST


On Mon, Apr 29, 2024 at 05:47:42PM +0000, John Garry wrote:
> Support atomic writes by producing a single BIO with REQ_ATOMIC flag set.
>
> We rely on the FS to guarantee extent alignment, such that an atomic write
> should never straddle two or more extents. The FS should also check for
> validity of an atomic write length/alignment.
>
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> ---
> fs/iomap/direct-io.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index a3ed7cfa95bc..d7bdeb675068 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -275,6 +275,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> struct iomap_dio *dio)
> {
> + bool is_atomic = dio->iocb->ki_flags & IOCB_ATOMIC;
> const struct iomap *iomap = &iter->iomap;
> struct inode *inode = iter->inode;
> unsigned int zeroing_size, pad;
> @@ -387,6 +388,9 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
> bio->bi_write_hint = inode->i_write_hint;
> bio->bi_ioprio = dio->iocb->ki_ioprio;
> + if (is_atomic)
> + bio->bi_opf |= REQ_ATOMIC;

REQ_ATOMIC is only valid for write IO, isn't it?

This should be added in iomap_dio_bio_opflags() after it is
determined we are doing a write operation. Regardless, it should be
added in iomap_dio_bio_opflags(), not here. That also allows us to
get rid of the is_atomic variable.

> +
> bio->bi_private = dio;
> bio->bi_end_io = iomap_dio_bio_end_io;
>
> @@ -403,6 +407,12 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> }
>
> n = bio->bi_iter.bi_size;
> + if (is_atomic && n != orig_count) {
> + /* This bio should have covered the complete length */
> + ret = -EINVAL;
> + bio_put(bio);
> + goto out;
> + }

What happens now if we've done zeroing IO before this? I suspect we
might expose stale data if the partial block zeroing converts the
unwritten extent in full...

> if (dio->flags & IOMAP_DIO_WRITE) {
> task_io_account_write(n);
> } else {

Ignoring the error handling issues, this code might be better as:

if (dio->flags & IOMAP_DIO_WRITE) {
if ((opflags & REQ_ATOMIC) && n != orig_count) {
/* atomic writes are all or nothing */
ret = -EIO
bio_put(bio);
goto out;
}
}

so that we are not putting atomic write error checks in the read IO
submission path.

-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx