Re: [RFC 2/3] iomap: Enable stable writes for RWF_WRITETHROUGH inodes

From: Ojaswin Mujoo

Date: Wed Mar 11 2026 - 02:28:50 EST


On Tue, Mar 10, 2026 at 10:55:04AM +0530, Ritesh Harjani wrote:
> "Darrick J. Wong" <djwong@xxxxxxxxxx> writes:
>
> > On Mon, Mar 09, 2026 at 11:04:32PM +0530, Ojaswin Mujoo wrote:
> >> Currently, RWF_WRITETHROUGH writes wait for writeback to complete
> >> on a folio before performing the writethrough. This serializes
> >> writethrough with each other and the writeback path. However, it is also
> >> desirable have similar guarantees between RWF_WRITETHROUGH and non
> >> writethrough writes.
> >>
> >> Hence, ensure stable writes are enabled on an inode's mapping as
> >> long as a writethrough write is ongoing. This way, all paths will
> >> wait for RWF_WRITETHROUGH to complete on a folio before proceeding.
> >>
> >> To track inflight writethrough writes, we use an atomic counter in the
> >> inode->i_mapping. This struct was chosen because (i) writethrough is an
> >> operation on the folio and (ii) we don't want to add bloat to struct
> >> inode.
>
> Now I am also questioning the need of this counter.
> If mapping has AS_STABLE_WRITES bit set, then that means the
> inode->mapping is going through stable writes until that bit is
> cleared. And since in future we are going to add support of async
> buffered write-through, so the stable writes bit should get cleared in
> the completion path (like how it is done now.)
>
> >
> > What if we just set it whenever someone successfully initiates a
> > RWF_WRITETHROUGH write? Then we wouldn't need all this atomic counter
> > machinery.
> >

>
> I agree. If we set the mapping as stable before initiating
> iomap_write_begin() itself, then we don't need this atomic counter.
>
> Maybe, we can set it in iomap_file_writethrough_write() itself
> (we have mapping available from iocb).

Hi Darrick, Ritesh,

Yes, I think we don't need the counter to know when to switch stable
writes on and off. Now that I'm thinking about it, maybe a mapping level
stable write is too restrictive? I understand that for certain hardware we
need it at mapping level but for cases like writethrough, all we need is
that particular folio to complete writeback. Why should we serialize
it with other non overlapping writes.

Maybe implementing a folio level stable writes or sprinkling around
folio_wait_writeback() makes more sense?

Also since we are on this topic, another thing that I should change is
where we call folio_mkclean(). Right now we call folio_mkclean() after
copying user data to pagecache, which means theres a window where mmap
write might change the data. I think we should proactively call it
before the memcpy?

>
>
> > Also: What if some filesystem (not xfs, obviously) finds a need to
> > change the stablepages bit while there might be writethrough writes in
> > progress?
>
> Is there a usecase where this can happen (just curious)?
>
> > It's a little awkward to have a flag /and/ a counter; why not
> > change mapping_{set,clear}_stable_pages to inc and dec the counter and
> > base the test off that?
> >
>
> Yes, either ways, I agree that I don't see the need of an extra counter here.
>
> -ritesh