Re: [RFC PATCH v2 2/5] iomap: Add initial support for buffered RWF_WRITETHROUGH
From: Pankaj Raghav (Samsung)
Date: Mon Apr 20 2026 - 08:01:37 EST
> +
> + if (wt_ops->writethrough_submit)
> + wt_ops->writethrough_submit(wt_ctx->inode, iomap, wt_ctx->bio_pos,
> + len);
> +
> + bio = bio_alloc(iomap->bdev, wt_ctx->nr_bvecs, REQ_OP_WRITE, GFP_NOFS);
We might want to check if bio_alloc succeeded here.
> + bio->bi_iter.bi_sector = iomap_sector(iomap, wt_ctx->bio_pos);
> + bio->bi_end_io = iomap_writethrough_bio_end_io;
> + bio->bi_private = wt_ctx;
> +
> + for (i = 0; i < wt_ctx->nr_bvecs; i++)
> + __bio_add_page(bio, wt_ctx->bvec[i].bv_page,
> + wt_ctx->bvec[i].bv_len,
> + wt_ctx->bvec[i].bv_offset);
> +
> + atomic_inc(&wt_ctx->ref);
> + submit_bio(bio);
> + wt_ctx->nr_bvecs = 0;
> +}
> +
<snip>
> +
> +/**
> + * iomap_writethrough_iter - perform RWF_WRITETHROUGH buffered write
> + * @wt_ctx: writethrough context
> + * @iter: iomap iter holding mapping information
> + * @i: iov_iter for write
> + * @wt_ops: the fs callbacks needed for writethrough
> + *
> + * This function copies the user buffer to folio similar to usual buffered
> + * IO path, with the difference that we immediately issue the IO. For this we
> + * utilize IO submission and completion mechanism that is inspired by dio.
> + *
> + * Folio handling note: We might be writing through a partial folio so we need
> + * to be careful to not clear the folio dirty bit unless there are no dirty blocks
> + * in the folio after the writethrough.
> + */
> +static int iomap_writethrough_iter(struct iomap_writethrough_ctx *wt_ctx,
> + struct iomap_iter *iter, struct iov_iter *i,
> + const struct iomap_writethrough_ops *wt_ops)
> +
> +{
> + ssize_t total_written = 0;
> + int status = 0;
> + struct address_space *mapping = iter->inode->i_mapping;
> + size_t chunk = mapping_max_folio_size(mapping);
> + unsigned int bdp_flags = (iter->flags & IOMAP_NOWAIT) ? BDP_ASYNC : 0;
> + unsigned int bs = i_blocksize(iter->inode);
> +
> + /* copied over based on DIO handles these flags */
> + if (iter->iomap.type == IOMAP_UNWRITTEN)
> + wt_ctx->flags |= IOMAP_DIO_UNWRITTEN;
> + if (iter->iomap.flags & IOMAP_F_SHARED)
> + wt_ctx->flags |= IOMAP_DIO_COW;
> +
> + if (!(iter->flags & IOMAP_WRITETHROUGH))
> + return -EINVAL;
> +
> + do {
> + struct folio *folio;
> + size_t offset; /* Offset into folio */
> + u64 bytes; /* Bytes to write to folio */
> + size_t copied; /* Bytes copied from user */
> + u64 written; /* Bytes have been written */
> + loff_t pos;
> + size_t off_aligned, len_aligned;
> +
> + bytes = iov_iter_count(i);
> +retry:
> + offset = iter->pos & (chunk - 1);
> + bytes = min(chunk - offset, bytes);
> + status = balance_dirty_pages_ratelimited_flags(mapping,
> + bdp_flags);
> + if (unlikely(status))
> + break;
> +
> + /*
> + * If completions already occurred and reported errors, give up
> + * now and don't bother submitting more bios.
> + */
> + if (unlikely(data_race(wt_ctx->error))) {
In the unlikely scenario where we encounter an error, do we have to also
clear the writeback flag on all the folios that is part of this
bvec until now?
Something like explicitly iterate over wt_ctx->bvec[0] through
wt_ctx->bvec[nr_bvecs - 1], manually call folio_end_writeback(bvec[i].bv_page)
on them, and then discard the bvecs by setting the nr_bvecs = 0;
I am wondering if the folios that were processed until now will be in
PG_WRITEBACK state which can affect reclaim as we never clear the flag.
> + wt_ctx->nr_bvecs = 0;
> + break;
> + }
> +
--
Pankaj