Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation

Next message: Alexandre Courbot: "Re: [PATCH v5 1/3] rust: sizes: add SizeConstants trait for device address space constants"
Previous message: Hao Li: "Re: [PATCH] slub: use N_NORMAL_MEMORY in can_free_to_pcs to handle remote frees"
In reply to: IBM: "Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation"
Next in thread: IBM: "Re: [PATCH 1/1] iomap: avoid compaction for costly folio order allocation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Matthew Wilcox

Date: Sat Apr 04 2026 - 00:15:32 EST

On Fri, Apr 03, 2026 at 07:35:34PM +0000, Salvatore Dipietro wrote:
> Commit 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
> introduced high-order folio allocations in the buffered write
> path. When memory is fragmented, each failed allocation triggers
> compaction and drain_all_pages() via __alloc_pages_slowpath(),
> causing a 0.75x throughput drop on pgbench (simple-update) with
> 1024 clients on a 96-vCPU arm64 system.
>
> Strip __GFP_DIRECT_RECLAIM from folio allocations in
> iomap_get_folio() when the order exceeds PAGE_ALLOC_COSTLY_ORDER,
> making them purely opportunistic.

If you look at __filemap_get_folio_mpol(), that's kind of being tried
already:

if (order > min_order)
alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;

* %__GFP_NORETRY: The VM implementation will try only very lightweight
* memory direct reclaim to get some memory under memory pressure (thus
* it can sleep). It will avoid disruptive actions like OOM killer. The
* caller must handle the failure which is quite likely to happen under
* heavy memory pressure. The flag is suitable when failure can easily be
* handled at small cost, such as reduced throughput.

which, from the description, seemed like the right approach. So either
the description or the implementation should be updated, I suppose?

Now, what happens if you change those two lines to:

if (order > min_order) {
alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
alloc_gfp |= __GFP_NOWARN;
}

Do you recover the performance?