Re: [PATCH v2] btrfs: allocate eb-attached btree pages as movable
From: David Sterba
Date: Wed May 27 2026 - 09:09:41 EST
On Tue, May 26, 2026 at 06:37:39PM -0400, Rik van Riel wrote:
> Extent buffer pages allocated by alloc_extent_buffer() are attached to
> btree_inode->i_mapping (the buffer_tree path), reach the LRU, and are
> served by the btree_migrate_folio aops in fs/btrfs/disk-io.c. They are
> migratable in practice once their owning extent buffer hits refs == 1,
> which happens naturally. The buddy allocator classifies them by GFP,
> however, and bare GFP_NOFS lands them in MIGRATE_UNMOVABLE pageblocks.
> The result: every btree_inode page we read in pins an unmovable pageblock
> from the page-superblock allocator's perspective, even though the page
> itself can be moved.
>
> Have each caller of btrfs_alloc_page_array, btrfs_alloc_folio_array,
> and alloc_eb_folio_array pass in the full GFP mask directly, instead
> of having the functions calculate it from boolean flags.
>
> The alloc_extent_buffer call site passes GFP_NOFS | __GFP_NOFAIL |
> __GFP_MOVABLE. All other call sites pass plain GFP_NOFS.
>
> Three categories of caller stay on bare GFP_NOFS, deliberately:
>
> - alloc_dummy_extent_buffer / btrfs_clone_extent_buffer: the
> resulting eb is EXTENT_BUFFER_UNMAPPED, folio->mapping stays NULL,
> the folios never enter LRU, never get migrate_folio aops. Tagging
> them __GFP_MOVABLE would violate the page allocator's migrability
> contract and they would defeat compaction in MOVABLE pageblocks
> where isolate_migratepages_block skips non-LRU non-movable_ops
> pages outright.
>
> - btrfs_alloc_page_array callers in fs/btrfs/raid56.c (stripe
> pages), fs/btrfs/inode.c (encoded reads), fs/btrfs/ioctl.c (uring
> encoded reads), fs/btrfs/relocation.c (relocation buffers): same
> contract violation. raid56 stripe_pages additionally persist in
> the stripe cache (RBIO_CACHE_SIZE=1024) well beyond a single I/O,
> so they are not transient enough to hand-wave the contract.
>
> - btrfs_alloc_folio_array caller in fs/btrfs/scrub.c (stripe
> folios): same -- stripe->folios[] are private buffers freed via
> folio_put in release_scrub_stripe.
>
> This change targets the dominant fragmentation source observed on the
> page-superblock series: ~28 GB of btree_inode pages parked across
> many tainted superpageblocks on a 250 GB test system with btrfs root,
> preventing 1 GiB hugepage allocation from those regions. With the
> movable hint, those pages now land in MOVABLE pageblocks where the
> existing background defragger drains them through the standard
> PB_has_movable gate, no LRU-sample fallback needed.
>
> Cc: Chris Mason <clm@xxxxxxxx>
> Cc: David Sterba <dsterba@xxxxxxxx>
> Cc: Boris Burkov <boris@xxxxxx>
> Cc: linux-btrfs@xxxxxxxxxxxxxxx
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> Assisted-by: Claude:claude-opus-4-6
> ---
> v2: pass the gfp mask directly to each function from the callers (thanks Boris)
Added to for-next, thanks.