Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()

From: Vlastimil Babka
Date: Thu Apr 10 2025 - 04:53:34 EST


On 4/7/25 20:01, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
>
> 1. should_try_claim_block() is a loop invariant. There is no point in
> checking fallback areas if the caller is interested in claimable
> blocks but the order and the migratetype don't allow for that.
>
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> have to run those tests.
>
> Different callers want different things from this helper:
>
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
>
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
>
> Before:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 85,294.85 msec task-clock # 5.644 CPUs utilized ( +- 0.32% )
> 15,968 context-switches # 187.209 /sec ( +- 3.81% )
> 153 cpu-migrations # 1.794 /sec ( +- 3.29% )
> 801,808 page-faults # 9.400 K/sec ( +- 0.10% )
> 733,358,331,786 instructions # 1.87 insn per cycle ( +- 0.20% ) (64.94%)
> 392,622,904,199 cycles # 4.603 GHz ( +- 0.31% ) (64.84%)
> 148,563,488,531 branches # 1.742 G/sec ( +- 0.18% ) (63.86%)
> 152,143,228 branch-misses # 0.10% of all branches ( +- 1.19% ) (62.82%)
>
> 15.1128 +- 0.0637 seconds time elapsed ( +- 0.42% )
>
> After:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 84,380.21 msec task-clock # 5.664 CPUs utilized ( +- 0.21% )
> 16,656 context-switches # 197.392 /sec ( +- 3.27% )
> 151 cpu-migrations # 1.790 /sec ( +- 3.28% )
> 801,703 page-faults # 9.501 K/sec ( +- 0.09% )
> 731,914,183,060 instructions # 1.88 insn per cycle ( +- 0.38% ) (64.90%)
> 388,673,535,116 cycles # 4.606 GHz ( +- 0.24% ) (65.06%)
> 148,251,482,143 branches # 1.757 G/sec ( +- 0.37% ) (63.92%)
> 149,766,550 branch-misses # 0.10% of all branches ( +- 1.22% ) (62.88%)
>
> 14.8968 +- 0.0486 seconds time elapsed ( +- 0.33% )
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>

Yay, you found a way to get rid of the ugly "bool claim_only, bool
*claim_block" parameter combo. Great!

Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>