Re: [RESEND PATCH v2] btrfs: prevent direct reclaim during compressed readahead

From: David Sterba

Date: Mon Mar 30 2026 - 15:57:05 EST


On Sat, Mar 28, 2026 at 02:46:19PM -0700, JP Kobryn (Meta) wrote:
> Under memory pressure, direct reclaim can kick in during compressed
> readahead. This puts the associated task into D-state. Then shrink_lruvec()
> disables interrupts when acquiring the LRU lock. Under heavy pressure,
> we've observed reclaim can run long enough that the CPU becomes prone to
> CSD lock stalls since it cannot service incoming IPIs. Although the CSD
> lock stalls are the worst case scenario, we have found many more subtle
> occurrences of this latency on the order of seconds, over a minute in some
> cases.
>
> Prevent direct reclaim during compressed readahead. This is achieved by
> using different GFP flags at key points when the bio is marked for
> readahead.
>
> There are two functions that allocate during compressed readahead:
> btrfs_alloc_compr_folio() and add_ra_bio_pages(). Both currently use
> GFP_NOFS which includes __GFP_DIRECT_RECLAIM.
>
> For the internal API call btrfs_alloc_compr_folio(), the signature changes
> to accept an additional gfp_t parameter. At the readahead call site, it
> gets flags similar to GFP_NOFS but stripped of __GFP_DIRECT_RECLAIM.
> __GFP_NOWARN is added since these allocations are allowed to fail. Demand
> reads still use full GFP_NOFS and will enter reclaim if needed. All other
> existing call sites of btrfs_alloc_compr_folio() now explicitly pass
> GFP_NOFS to retain their current behavior.
>
> add_ra_bio_pages() gains a bool parameter which allows callers to specify
> if they want to allow direct reclaim or not. In either case, the
> __GFP_NOWARN flag was added unconditionally since the allocations are
> speculative.
>
> There has been some previous work done on calling add_ra_bio_pages() [0].
> This patch is complementary: where that patch reduces call frequency, this
> patch reduces the latency associated with those calls.
>
> [0] https://lore.kernel.org/linux-btrfs/656838ec1232314a2657716e59f4f15a8eadba64.1751492111.git.boris@xxxxxx/
>
> Signed-off-by: JP Kobryn (Meta) <jp.kobryn@xxxxxxxxx>
> Reviewed-by: Mark Harmstone <mark@xxxxxxxxxxxxx>
> ---
> v2:
> - dropped patch 1/2, squashed into single patch based on David's feedback
> - changed btrfs_alloc_compr_folio() signature instead of new _gfp variant
> - update other existing callers to pass GFP_NOFS explicitly
>
> v1: https://lore.kernel.org/linux-btrfs/20260320073445.80218-1-jp.kobryn@xxxxxxxxx/
>
> fs/btrfs/compression.c | 42 +++++++++++++++++++++++++++++++++++-------
> fs/btrfs/compression.h | 2 +-
> fs/btrfs/inode.c | 2 +-
> fs/btrfs/lzo.c | 6 +++---
> fs/btrfs/zlib.c | 6 +++---
> fs/btrfs/zstd.c | 6 +++---
> 6 files changed, 46 insertions(+), 18 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index e897342bece1f..8f33ef48b501e 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -180,7 +180,7 @@ static unsigned long btrfs_compr_pool_scan(struct shrinker *sh, struct shrink_co
> /*
> * Common wrappers for page allocation from compression wrappers
> */
> -struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
> +struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info, gfp_t gfp)
> {
> struct folio *folio = NULL;
>
> @@ -200,7 +200,7 @@ struct folio *btrfs_alloc_compr_folio(struct btrfs_fs_info *fs_info)
> return folio;
>
> alloc:
> - return folio_alloc(GFP_NOFS, fs_info->block_min_order);
> + return folio_alloc(gfp, fs_info->block_min_order);
> }
>
> void btrfs_free_compr_folio(struct folio *folio)
> @@ -368,7 +368,8 @@ struct compressed_bio *btrfs_alloc_compressed_write(struct btrfs_inode *inode,
> static noinline int add_ra_bio_pages(struct inode *inode,
> u64 compressed_end,
> struct compressed_bio *cb,
> - int *memstall, unsigned long *pflags)
> + int *memstall, unsigned long *pflags,
> + bool direct_reclaim)
> {
> struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
> pgoff_t end_index;
> @@ -376,6 +377,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
> u64 cur = cb->orig_bbio->file_offset + orig_bio->bi_iter.bi_size;
> u64 isize = i_size_read(inode);
> int ret;
> + gfp_t constraint_gfp, cache_gfp;
> struct folio *folio;
> struct extent_map *em;
> struct address_space *mapping = inode->i_mapping;
> @@ -405,6 +407,19 @@ static noinline int add_ra_bio_pages(struct inode *inode,
>
> end_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
>
> + /*
> + * Avoid direct reclaim when the caller does not allow it.
> + * Since add_ra_bio_pages is always speculative, suppress
> + * allocation warnings in either case.
> + */
> + if (!direct_reclaim) {
> + constraint_gfp = ~(__GFP_FS | __GFP_DIRECT_RECLAIM);
> + cache_gfp = (GFP_NOFS & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> + } else {
> + constraint_gfp = ~__GFP_FS;
> + cache_gfp = GFP_NOFS | __GFP_NOWARN;
> + }
> +
> while (cur < compressed_end) {
> pgoff_t page_end;
> pgoff_t pg_index = cur >> PAGE_SHIFT;
> @@ -434,12 +449,13 @@ static noinline int add_ra_bio_pages(struct inode *inode,
> continue;
> }
>
> - folio = filemap_alloc_folio(mapping_gfp_constraint(mapping, ~__GFP_FS),
> + folio = filemap_alloc_folio(mapping_gfp_constraint(mapping,
> + constraint_gfp) | __GFP_NOWARN,

It would be IMHO better to put the __GFP_NOWARN to the definition of
constraint_gfp so it's all done in one go.