Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read

From: Joanne Koong

Date: Wed Feb 11 2026 - 18:15:16 EST


On Wed, Feb 11, 2026 at 1:03 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 11, 2026 at 11:33:05AM -0800, Joanne Koong wrote:
> > ifs->read_bytes_pending gets initialized to the folio size, but if the
> > file being read in is smaller than the size of the folio, then we
> > reach this scenario because the file has been read in but
> > ifs->read_bytes_pending is still a positive value because it
> > represents the bytes between the end of the file and the end of the
> > folio. If the folio size is 16k and the file size is 4k:
> > a) ifs->read_bytes_pending gets initialized to 16k
> > b) ->read_folio_range() is called for the 4k read
> > c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
> > 0 to 4k range is marked uptodate
> > d) the post-eof blocks are zeroed and marked uptodate in the call to
> > iomap_set_range_uptodate()
>
> This is the bug then. If they're marked uptodate, read_bytes_pending
> should be decremented at the same time. Now, I appreciate that
> iomap_set_range_uptodate() is called both from iomap_read_folio_iter()
> and __iomap_write_begin(), and it can't decrement read_bytes_pending
> in the latter case. Perhaps a flag or a second length parameter is
> the solution?

I don't think it's enough to decrement read_bytes_pending by the
zeroed/read-inline length because there's these two edge cases:
a) some blocks in the folio were already uptodate from the very
beginning and skipped for IO but not decremented yet from
ifs->read_bytes_pending, which means in iomap_read_end(),
ifs->read_bytes_pending would be > 0 and the uptodate flag could get
XORed again. This means we need to also decrement read_bytes_pending
by bytes_submitted as well for this case
b) the async ->read_folio_range() callback finishes after the
zeroing's read_bytes_pending decrement and calls folio_end_read(), so
we need to assign ctx->cur_folio to NULL

I think the code would have to look something like [1] (this is
similar to the alternative approach I mentioned in my previous reply
but fixed up to cover some more edge cases).

Thanks,
Joanne

[1] https://github.com/joannekoong/linux/commit/b42f47726433a8130e8c27d1b43b16e27dfd6960

>
> > e) iomap_set_range_uptodate() sees all the ranges are marked
> > uptodate and it marks the folio uptodate
> > f) iomap_read_end() gets called to subtract the 12k from
> > ifs->read_bytes_pending. it too sees all the ranges are marked
> > uptodate and marks the folio uptodate
> >
> > The same scenario could happen for IOMAP_INLINE mappings if part of
> > the folio is read in through ->read_folio_range() and then the rest is
> > read in as inline data.
>
> This is basically the same case as post-eof.
>
> > An alternative solution is to not have zeroed-out / inlined mappings
> > call iomap_read_end(), eg something like this [1], but this adds
> > additional complexity and doesn't work if there's additional mappings
> > for the folio after a non-IOMAP_MAPPED mapping.

(I was wrong about it not working for cases where's additional
mappings after a non-IOMAP_MAPPED mapping, since both
inline-read/zeroing are no-ops if the entire folio is already
uptodate)

> >
> > Is there a better approach that I'm missing?
> >
> > Thanks,
> > Joanne
> >
> > [1] https://github.com/joannekoong/linux/commit/de48d3c29db8ae654300341e3eec12497df54673