Re: [PATCH] btrfs: fix subpage state mismatch in cow_fixup writeback path
From: David Sterba
Date: Mon May 25 2026 - 10:17:38 EST
On Mon, Mar 16, 2026 at 10:56:56AM +0000, Werner Kasselman wrote:
> writepage_delalloc() marks all dirty sectors as locked via
> btrfs_folio_set_lock(), setting bits in the subpage locked bitmap and
> incrementing nr_locked. These are cleaned up by
> btrfs_folio_end_lock_bitmap() at the end of extent_writepage().
>
> However, when btrfs_writepage_cow_fixup() returns -EAGAIN inside
> extent_writepage_io(), the code calls folio_unlock() directly and
> returns 1, causing extent_writepage() to skip the bitmap cleanup:
>
> ret = btrfs_writepage_cow_fixup(folio);
> if (ret == -EAGAIN) {
> folio_redirty_for_writepage(bio_ctrl->wbc, folio);
> folio_unlock(folio); // doesn't clear locked bitmap
> return 1; // caller skips end_lock_bitmap()
> }
>
> This leaves the subpage locked bitmap out of sync with the folio lock
> state: the folio is unlocked but its subpage locked bitmap still has
> bits set and nr_locked is elevated. When writeback retries the folio,
> btrfs_folio_set_lock() hits the ASSERT at subpage.c:746 because the
> bits are still set from the previous attempt.
>
> The cow_fixup path is largely a legacy path -- the GUP dirty-without-
> informing-fs issue that triggered it has been fixed on the GUP side,
> and experimental builds already catch this case with -EUCLEAN before
> reaching the -EAGAIN return. However the subpage state mismatch is
> still a correctness issue for non-experimental builds under error
> injection or memory pressure (kzalloc failure in
> btrfs_writepage_cow_fixup()).
>
> Fix this by replacing folio_unlock() with btrfs_folio_end_lock_bitmap(),
> which properly clears the locked bitmap bits before unlocking. For
> non-subpage or when nr_locked is 0 (e.g. called from
> extent_write_locked_range()), btrfs_folio_end_lock_bitmap() falls
> through to plain folio_unlock(), so existing behavior is preserved.
>
> Fixes: d034cdb4cc8a ("btrfs: lock subpage ranges in one go for writepage_delalloc()")
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Werner Kasselman <werner@xxxxxxxxxxx>
I'm going through patch backlog, this patch has some relevance. We're
going to remove the fixup worker code in 7.2 completely so it cannot be
applied to the development branch anymore.
The problems are hard to hit or need error injection, I don't know if
it's worth to backport to stable. We've provided a long grace period to
the fixup worker before removal and I'm glad we can delete it and forget
about it. If somebody wants one last fix then I'm OK with that.