Re: [RFC PATCH 1/1] iomap: fix race between iomap_set_range_uptodate and folio_end_read
From: Joanne Koong
Date: Wed Feb 11 2026 - 14:33:33 EST
On Tue, Feb 10, 2026 at 7:11 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Tue, Feb 10, 2026 at 02:18:06PM -0800, Joanne Koong wrote:
> > spin_lock_irqsave(&ifs->state_lock, flags);
> > - uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> > + /*
> > + * If a read is in progress, we must NOT call
> > folio_mark_uptodate.
> > + * The read completion path (iomap_finish_folio_read or
> > + * iomap_read_end) will call folio_end_read() which uses XOR
> > + * semantics to set the uptodate bit. If we set it here, the XOR
> > + * in folio_end_read() will clear it, leaving the folio not
> > + * uptodate.
> > + */
> > + uptodate = ifs_set_range_uptodate(folio, ifs, off, len) &&
> > + !ifs->read_bytes_pending;
> > spin_unlock_irqrestore(&ifs->state_lock, flags);
>
> This can't possibly be the right fix. There's some horrible confusion
> here. It should not be possible to have read bytes pending _and_ the
> entire folio be uptodate. That's an invariant that should always be
> maintained.
ifs->read_bytes_pending gets initialized to the folio size, but if the
file being read in is smaller than the size of the folio, then we
reach this scenario because the file has been read in but
ifs->read_bytes_pending is still a positive value because it
represents the bytes between the end of the file and the end of the
folio. If the folio size is 16k and the file size is 4k:
a) ifs->read_bytes_pending gets initialized to 16k
b) ->read_folio_range() is called for the 4k read
c) the 4k read succeeds, ifs->read_bytes_pending is now 12k and the
0 to 4k range is marked uptodate
d) the post-eof blocks are zeroed and marked uptodate in the call to
iomap_set_range_uptodate()
e) iomap_set_range_uptodate() sees all the ranges are marked
uptodate and it marks the folio uptodate
f) iomap_read_end() gets called to subtract the 12k from
ifs->read_bytes_pending. it too sees all the ranges are marked
uptodate and marks the folio uptodate
The same scenario could happen for IOMAP_INLINE mappings if part of
the folio is read in through ->read_folio_range() and then the rest is
read in as inline data.
An alternative solution is to not have zeroed-out / inlined mappings
call iomap_read_end(), eg something like this [1], but this adds
additional complexity and doesn't work if there's additional mappings
for the folio after a non-IOMAP_MAPPED mapping.
Is there a better approach that I'm missing?
Thanks,
Joanne
[1] https://github.com/joannekoong/linux/commit/de48d3c29db8ae654300341e3eec12497df54673