Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks

From: Vlastimil Babka
Date: Tue Apr 15 2025 - 03:31:50 EST


On 4/13/25 04:20, Johannes Weiner wrote:
> On Fri, Apr 11, 2025 at 02:21:58PM -0400, Johannes Weiner wrote:
>> On Fri, Apr 11, 2025 at 06:51:51PM +0200, Vlastimil Babka wrote:
>> > [*] now when checking the code between kswapd and kcompactd handover, I
>> > think I found a another problem?
>> >
>> > we have:
>> > kswapd_try_to_sleep()
>> > prepare_kswapd_sleep() - needs to succeed for wakeup_kcompactd()
>> > pgdat_balanced() - needs to be true for prepare_kswapd_sleep() to be true
>> > - with defrag_mode we want high watermark of NR_FREE_PAGES_BLOCKS, but
>> > we were only reclaiming until now and didn't wake up kcompactd and
>> > this actually prevents the wake up?
>
> Coming back to this, there is indeed a defrag_mode issue. My
> apologies, I misunderstood what you were pointing at.
>
> Like I said, kswapd reverts to order-0 in some other place to go to
> sleep and trigger the handoff. At that point, defrag_mode also needs
> to revert to NR_FREE_PAGES.

I missed that revert to order-0 and that without it the current code also
wouldn't make sense. But I agree with the fix.

> It's curious that this didn't stand out in testing. On the contrary,
> kcompactd was still doing the vast majority of the compaction work. It
> looks like kswapd and direct workers on their own manage to balance
> NR_FREE_PAGES_BLOCK every so often, and then kswapd hands off. Given
> the low number of kcompactd wakeups, the consumers keep it going.
>
> So testing with this:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cc422ad830d6..c2aa0a4b67de 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -6747,8 +6747,11 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
> /*
> * In defrag_mode, watermarks must be met in whole
> * blocks to avoid polluting allocator fallbacks.
> + *
> + * When kswapd has compact gap, check regular
> + * NR_FREE_PAGES and hand over to kcompactd.
> */
> - if (defrag_mode)
> + if (defrag_mode && order)
> item = NR_FREE_PAGES_BLOCKS;
> else
> item = NR_FREE_PAGES;
>
> I'm getting the following results:
>
> fallbackspeed/STUPID-DEFRAGMODE fallbackspeed/DEFRAGMODE
> Hugealloc Time mean 79381.34 ( +0.00%) 88126.12 ( +11.02%)
> Hugealloc Time stddev 85852.16 ( +0.00%) 135366.75 ( +57.67%)
> Kbuild Real time 249.35 ( +0.00%) 226.71 ( -9.04%)
> Kbuild User time 1249.16 ( +0.00%) 1249.37 ( +0.02%)
> Kbuild System time 171.76 ( +0.00%) 166.93 ( -2.79%)
> THP fault alloc 51666.87 ( +0.00%) 52685.60 ( +1.97%)
> THP fault fallback 16970.00 ( +0.00%) 15951.87 ( -6.00%)
> Direct compact fail 166.53 ( +0.00%) 178.93 ( +7.40%)
> Direct compact success 17.13 ( +0.00%) 4.13 ( -71.69%)
> Compact daemon scanned migrate 3095413.33 ( +0.00%) 9231239.53 ( +198.22%)
> Compact daemon scanned free 2155966.53 ( +0.00%) 7053692.87 ( +227.17%)
> Compact direct scanned migrate 265642.47 ( +0.00%) 68388.33 ( -74.26%)
> Compact direct scanned free 130252.60 ( +0.00%) 55634.87 ( -57.29%)
> Compact total migrate scanned 3361055.80 ( +0.00%) 9299627.87 ( +176.69%)
> Compact total free scanned 2286219.13 ( +0.00%) 7109327.73 ( +210.96%)
> Alloc stall 1890.80 ( +0.00%) 6297.60 ( +232.94%)
> Pages kswapd scanned 9043558.80 ( +0.00%) 5952576.73 ( -34.18%)
> Pages kswapd reclaimed 1891708.67 ( +0.00%) 1030645.00 ( -45.52%)
> Pages direct scanned 1017090.60 ( +0.00%) 2688047.60 ( +164.29%)
> Pages direct reclaimed 92682.60 ( +0.00%) 309770.53 ( +234.22%)
> Pages total scanned 10060649.40 ( +0.00%) 8640624.33 ( -14.11%)
> Pages total reclaimed 1984391.27 ( +0.00%) 1340415.53 ( -32.45%)
> Swap out 884585.73 ( +0.00%) 417781.93 ( -52.77%)
> Swap in 287106.27 ( +0.00%) 95589.73 ( -66.71%)
> File refaults 551697.60 ( +0.00%) 426474.80 ( -22.70%)
>
> Work has shifted from direct to kcompactd. In aggregate there is more
> compaction happening. Meanwhile aggregate reclaim and swapping drops
> quite substantially. %sys is down, so this is just more efficient.
>
> Reclaim and swapping are down substantially, which is great. But the
> reclaim work that remains has shifted somewhat to direct reclaim,
> which is unfortunate. THP delivery is also a tad worse, but still much
> better than !defrag_mode, so not too concerning. That part deserves a
> bit more thought.
>
> Overall, this looks good, though. I'll send a proper patch next week.
>
> Thanks for the review, Vlastimil.

NP!