Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks
From: Vlastimil Babka
Date: Fri Apr 11 2025 - 12:54:57 EST
On 4/11/25 17:39, Johannes Weiner wrote:
> On Fri, Apr 11, 2025 at 10:19:58AM +0200, Vlastimil Babka wrote:
>> On 3/13/25 22:05, Johannes Weiner wrote:
>> > The previous patch added pageblock_order reclaim to kswapd/kcompactd,
>> > which helps, but produces only one block at a time. Allocation stalls
>> > and THP failure rates are still higher than they could be.
>> >
>> > To adequately reflect ALLOC_NOFRAGMENT demand for pageblocks, change
>> > the watermarking for kswapd & kcompactd: instead of targeting the high
>> > watermark in order-0 pages and checking for one suitable block, simply
>> > require that the high watermark is entirely met in pageblocks.
>>
>> Hrm.
>
> Hrm!
>
>> > @@ -2329,6 +2329,22 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>> > if (!pageblock_aligned(cc->migrate_pfn))
>> > return COMPACT_CONTINUE;
>> >
>> > + /*
>> > + * When defrag_mode is enabled, make kcompactd target
>> > + * watermarks in whole pageblocks. Because they can be stolen
>> > + * without polluting, no further fallback checks are needed.
>> > + */
>> > + if (defrag_mode && !cc->direct_compaction) {
>> > + if (__zone_watermark_ok(cc->zone, cc->order,
>> > + high_wmark_pages(cc->zone),
>> > + cc->highest_zoneidx, cc->alloc_flags,
>> > + zone_page_state(cc->zone,
>> > + NR_FREE_PAGES_BLOCKS)))
>> > + return COMPACT_SUCCESS;
>> > +
>> > + return COMPACT_CONTINUE;
>> > + }
>>
>> Wonder if this ever succeds in practice. Is high_wmark_pages() even aligned
>> to pageblock size? If not, and it's X pageblocks and a half, we will rarely
>> have NR_FREE_PAGES_BLOCKS cover all of that? Also concurrent allocations can
>> put us below high wmark quickly and then we never satisfy this?
>
> The high watermark is not aligned, but why does it have to be? It's a
> binary condition: met or not met. Compaction continues until it's met.
What I mean is, kswapd will reclaim until the high watermark, which would be
32.7 blocks, wake up kcompactd [*] but that can only create up to 32 blocks
of NR_FREE_PAGES_BLOCKS so it has already lost at that point? (unless
there's concurrent freeing pushing it above the high wmark)
> NR_FREE_PAGES_BLOCKS moves in pageblock_nr_pages steps. This means
> it'll really work until align_up(highmark, pageblock_nr_pages), as
> that's when NR_FREE_PAGES_BLOCKS snaps above the (unaligned) mark. But
> that seems reasonable, no?
How can it snap if it doesn't have enough free pages? Unlike kswapd,
kcompactd doesn't create them, only defragments.
> The allocator side is using low/min, so we have the conventional
> hysteresis between consumer and producer.
Sure but we cap kswapd at high wmark and the hunk quoted above also uses
high wmark so there's no hysteresis happening between kswapd and kcompactd?
> For illustration, on my 2G test box, the watermarks in DMA32 look like
> this:
>
> pages free 212057
> boost 0
> min 11164 (21.8 blocks)
> low 13955 (27.3 blocks)
> high 16746 (32.7 blocks)
> promo 19537
> spanned 456704
> present 455680
> managed 431617 (843.1 blocks)
>
> So there are several blocks between the kblahds wakeup and sleep. The
> first allocation to cut into a whole free block will decrease
> NR_FREE_PAGES_BLOCK by a whole block. But subsequent allocs that fill
> the remaining space won't change that counter. So the distance between
> the watermarks didn't fundamentally change (modulo block rounding).
>
>> Doesn't then happen that with defrag_mode, in practice kcompactd basically
>> always runs until scanners met?
>
> Tracing kcompactd calls to compaction_finished() with defrag_mode:
>
> @[COMPACT_CONTINUE]: 6955
> @[COMPACT_COMPLETE]: 19
> @[COMPACT_PARTIAL_SKIPPED]: 1
> @[COMPACT_SUCCESS]: 17
> @wakeuprequests: 3
OK that doesn't look that bad.
> Of course, similar to kswapd, it might not reach the watermarks and
> keep running if there is a continuous stream of allocations consuming
> the blocks it's making. Hence the ratio between wakeups & continues.
>
> But when demand stops, it'll balance the high mark and quit.
Again, since kcompactd can only defragment free space and not create it, it
may be trying in vain?
[*] now when checking the code between kswapd and kcompactd handover, I
think I found a another problem?
we have:
kswapd_try_to_sleep()
prepare_kswapd_sleep() - needs to succeed for wakeup_kcompactd()
pgdat_balanced() - needs to be true for prepare_kswapd_sleep() to be true
- with defrag_mode we want high watermark of NR_FREE_PAGES_BLOCKS, but
we were only reclaiming until now and didn't wake up kcompactd and
this actually prevents the wake up?