Re: [PATCH] btrfs: fix subpage locked bitmap leak in cow_fixup writeback path

From: Qu Wenruo

Date: Mon Mar 16 2026 - 06:08:01 EST




在 2026/3/16 20:27, Werner Kasselman 写道:
Hi,

While auditing the btrfs subpage writeback path in 7.0-rc4, I found a locked bitmap leak in extent_writepage_io() when btrfs_writepage_cow_fixup() returns -EAGAIN.

Since commit c96d0e392141 ("btrfs: mark all dirty sectors as locked inside writepage_delalloc()"), writepage_delalloc() calls btrfs_folio_set_lock() for all dirty sectors, setting bits in the subpage locked bitmap and incrementing nr_locked. These are normally cleaned up by btrfs_folio_end_lock_bitmap() at the end of extent_writepage().

However, when cow_fixup returns -EAGAIN, extent_writepage_io() calls plain folio_unlock() and returns 1.  This causes extent_writepage() to skip btrfs_folio_end_lock_bitmap() entirely.  The locked bitmap bits and nr_locked counter are leaked.

You (or the LLM) are right, although I won't call it bitmap leak, but subpage state mismatch against folio status.

The folio is unlocked but its subpage bitmap is not.

However this is already in a very corner case, that cow fixup path can only be triggered when a dirty range is submitted without an ordered extent.

It's possible to trigger but only with certain error injection, and even when that is triggered, it will be more likely for the EUCLEAN case other than EAGAIN.

Just check the EXPERIMENTAL feature check inside that function.


When writeback retries the folio, btrfs_folio_set_lock() tries to set the same bits again and hits the ASSERT at subpage.c:746.

The fix replaces folio_unlock() with btrfs_folio_end_lock_bitmap(), which clears the locked bitmap bits before unlocking.  For non-subpage configurations, btrfs_folio_end_lock_bitmap() falls through to plain folio_unlock(), so behavior is unchanged.

This affects subpage configurations (sectorsize < PAGE_SIZE, e.g. 4K sector on 64K-page ARM64) when the cow_fixup path triggers, which happens when a page is dirtied via GUP/pin_user_pages without going through the filesystem.

IIRC it's no longer the case, the dirty but without informing the fs bug is already fixed on the GUP side.


Finally, please send a patch properly through git-send-email, not an attachment.

How do you expect one to review an attachment?

Thanks,
Qu


I also wrote a standalone bitmap state machine simulation that reproduces the invariant violation and verifies the fix:

  tools/testing/btrfs/test-subpage-cow-fixup.c

  $ cc -o test tools/testing/btrfs/test-subpage-cow-fixup.c && ./test

  TEST 1: Buggy path (folio_unlock)                ... PASS (bug reproduced)

  TEST 2: Fixed path (end_lock_bitmap)             ... PASS

  TEST 3: Non-subpage path (both behave)           ... PASS

  TEST 4: Partial async + cow_fixup (buggy)        ... PASS (bug reproduced)

  TEST 5: Partial async + cow_fixup (fixed)        ... PASS

Please review.

Thanks,

Werner