Re: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears

Next message: Sarthak Sharma: "[PATCH] selftests: kselftest: fix wrong test number in ksft_exit_skip"
Previous message: Hannes Reinecke: "Re: [PATCH v2] nvme-auth: Include SC_C in RVAL controller hash"
In reply to: Matthew Wilcox: "Re: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears"
Next in thread: Matthew Wilcox: "Re: [RFC PATCH] fs/buffer: serialize set_buffer_uptodate against concurrent clears"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Jan Kara

Date: Mon Apr 27 2026 - 07:28:05 EST

On Sun 26-04-26 03:42:38, Matthew Wilcox wrote:
> On Sat, Apr 25, 2026 at 10:01:37PM -0400, Chao Shi wrote:
> > A WARN_ON_ONCE(!buffer_uptodate(bh)) in mark_buffer_dirty() is reachable
> > from the buffered write path on a block device when the underlying
> > device returns I/O errors at high density. Reproduced by fuzzing an
> > NVMe controller (FEMU) that returns crafted error completions for a
> > sustained workload from /dev/nvme0n1.
> >
> > The race is:
> >
> > CPU A: block_commit_write (folio lock held) CPU B: end_buffer_async_read
> > set_buffer_uptodate(bh);
> > clear_buffer_uptodate(bh);
> > mark_buffer_dirty(bh); /* WARN fires */
>
> Why are we calling clear_buffer_uptodate() in end_buffer_async_read()?
> If the buffer is uptodate, we shouldn't be reading into it. If it's
> not uptodate, we don't need to clear the uptodate flag because it's
> already clear.
>
> I've been deleting calls to ClearPageUptodate and folio_clear_uptodate()
> from filesystems; it's almost always the wrong thing to do. But the
> buffer cache does have slightly different rules from the page cache,
> so this may not translate well.

Right, I think the clear_buffer_uptodate() call in end_buffer_async_read()
is indeed superfluous and we can just remove it - possibly replace with
WARN_ON_ONCE(buffer_uptodate(bh)) - to deal with this race.

That being said block_commit_write() could possibly race with
end_buffer_write_sync() as well and there the clear_buffer_uptodate()
should stay (well, unless we want to do a serious rework of IO error bh
handling). So the changes to block_commit_write() might still be needed.

Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR