Re: [RFC PATCH 2/2] filemap: use high-order folios in filemap sync RA

From: Barry Song

Date: Thu Apr 16 2026 - 01:31:00 EST

On Wed, Apr 15, 2026 at 7:47 PM Anatoly Stepanov
<stepanov.anatoly@xxxxxxxxxx> wrote:
>
> [Idea]
>
> If a mmap'ed file being accessed such that async RA never
> kicks in, we might end up with only 0-order folios in the page cache.
>
> if fault_around_bytes is larger than 1 single page, then
> it's beneficial to use high-order folios, which brings significant
> filemap_map_pages() speedup.

Please note that there have been many complaints that readahead
pages in PF, as well as fault_around pages, may not be used later[1].
The performance of filemap_map_pages() is not really that important
compared to pages that will never be accessed and could otherwise
be reclaimed. With large folios (= fault_around), a single young PTE
can mark an entire folio as young, which can be quite harmful to
real workloads.

> So, let's just use fault_around_bytes as a starting point here.
>
> if an arch supports PTE-coalescing we can get more of those for free.
> (see arm64 example below)
>
> We don't save the new order to "ra->order", so if async RA will happen
> it would normally start from order-0.
>
> [Things to be discussed]
>
> But at the same time, i can see drawback for 16K, 64K pages, in this case fault_around will still be 64K by default.
> In this case, it seems makes sense to make the fault_around_bytes be like order-N of PAGE_SIZE, not fixed bytes number.
>
> Another issue is - when fault_around=0, but we'd like to use high-order folios for sync_RA, for cont-PTE for example,
> For this we can use kind of "max(fault_around_order, cont_pte_order)".
>
> Or introduce some dedicated tunable like "sync_mmap_order".

I guess we could benefit from a small order, such as 1 or 2.
Order 4 is really too large for many systems, such as Android.

But it seems Matthew never likes new control knobs?

>
> [Benchmark]
>
> Simple benchmark below reading 100M file in 4M (RA size) chunks
> such that async RA doesn't kick in and the page cache ends up being
> filled up with 0-order folios.
>
> The patched kernel gives ~3 times increase in throughput,
> considering the page cache is filled up at the moment.

If we consider reclamation, it becomes a completely different story.

[1] https://lore.kernel.org/linux-mm/20250916072226.220426-1-liulei.rjpt@xxxxxxxx/

Thanks
Barry