Re: [PATCH 5/5] mm: page_alloc: defrag_mode kswapd/kcompactd watermarks
From: Vlastimil Babka
Date: Fri Apr 11 2025 - 04:20:17 EST
On 3/13/25 22:05, Johannes Weiner wrote:
> The previous patch added pageblock_order reclaim to kswapd/kcompactd,
> which helps, but produces only one block at a time. Allocation stalls
> and THP failure rates are still higher than they could be.
>
> To adequately reflect ALLOC_NOFRAGMENT demand for pageblocks, change
> the watermarking for kswapd & kcompactd: instead of targeting the high
> watermark in order-0 pages and checking for one suitable block, simply
> require that the high watermark is entirely met in pageblocks.
Hrm.
> ---
> include/linux/mmzone.h | 1 +
> mm/compaction.c | 37 ++++++++++++++++++++++++++++++-------
> mm/internal.h | 1 +
> mm/page_alloc.c | 29 +++++++++++++++++++++++------
> mm/vmscan.c | 15 ++++++++++++++-
> mm/vmstat.c | 1 +
> 6 files changed, 70 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index dbb0ad69e17f..37c29f3fbca8 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -138,6 +138,7 @@ enum numa_stat_item {
> enum zone_stat_item {
> /* First 128 byte cacheline (assuming 64 bit words) */
> NR_FREE_PAGES,
> + NR_FREE_PAGES_BLOCKS,
> NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
> NR_ZONE_INACTIVE_ANON = NR_ZONE_LRU_BASE,
> NR_ZONE_ACTIVE_ANON,
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 036353ef1878..4a2ccb82d0b2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2329,6 +2329,22 @@ static enum compact_result __compact_finished(struct compact_control *cc)
> if (!pageblock_aligned(cc->migrate_pfn))
> return COMPACT_CONTINUE;
>
> + /*
> + * When defrag_mode is enabled, make kcompactd target
> + * watermarks in whole pageblocks. Because they can be stolen
> + * without polluting, no further fallback checks are needed.
> + */
> + if (defrag_mode && !cc->direct_compaction) {
> + if (__zone_watermark_ok(cc->zone, cc->order,
> + high_wmark_pages(cc->zone),
> + cc->highest_zoneidx, cc->alloc_flags,
> + zone_page_state(cc->zone,
> + NR_FREE_PAGES_BLOCKS)))
> + return COMPACT_SUCCESS;
> +
> + return COMPACT_CONTINUE;
> + }
Wonder if this ever succeds in practice. Is high_wmark_pages() even aligned
to pageblock size? If not, and it's X pageblocks and a half, we will rarely
have NR_FREE_PAGES_BLOCKS cover all of that? Also concurrent allocations can
put us below high wmark quickly and then we never satisfy this?
Doesn't then happen that with defrag_mode, in practice kcompactd basically
always runs until scanners met?
Maybe the check could instead e.g. compare NR_FREE_PAGES (aligned down to
pageblock size, or even with some extra slack?) with NR_FREE_PAGES_BLOCKS?
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -6724,11 +6724,24 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
> * meet watermarks.
> */
> for_each_managed_zone_pgdat(zone, pgdat, i, highest_zoneidx) {
> + unsigned long free_pages;
> +
> if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> mark = promo_wmark_pages(zone);
> else
> mark = high_wmark_pages(zone);
> - if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx))
I think you just removed the only user of this _safe() function. Is the
cpu-drift control it does no longer necessary?
> +
> + /*
> + * In defrag_mode, watermarks must be met in whole
> + * blocks to avoid polluting allocator fallbacks.
> + */
> + if (defrag_mode)
> + free_pages = zone_page_state(zone, NR_FREE_PAGES_BLOCKS);
> + else
> + free_pages = zone_page_state(zone, NR_FREE_PAGES);
> +
> + if (__zone_watermark_ok(zone, order, mark, highest_zoneidx,
> + 0, free_pages))
> return true;
> }
>