Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
From: wang lian
Date: Fri Apr 17 2026 - 04:17:36 EST
Hi Matthew, Barry,
> So, we try to do an order-3 allocation. kswapd runs and ...
> succeeds in creating order-3 pages? Or fails to?
>From our reproducer runs, both happen. We observe intermittent order-3
successes, but also frequent high-order failures followed by order-0
fallback.
> If it fails, that's something we need to sort out.
Agreed. In this workload, the bottleneck appears to be contiguity, not
raw reclaimable memory shortage. Order-0 memory remains available while
suitable order-3 blocks are often unavailable.
> If it succeeds, now we have several order-3 pages, great. But where do
> they all go that we need to run kswapd again?
In our runs, order-3 pockets do show up, but they do not last long.
They get consumed quickly by ongoing skb demand, and the pressure returns.
To investigate this, we built a reproducer that keeps creating memory fragments
while the network stack continuously requests order-3 allocations.[1][2]
Raw sample output (trimmed):
---------------------------------------------------------------------------------------------------
TIME | BUDDYINFO (Normal Zone) | MEMINFO | KSWAPD CPU & VMSTAT
---------------------------------------------------------------------------------------------------
11:08:11 | ord0:11622 ord3:0 | Free:96MB Avail:1309MB | CPU: 10.0% scan:83107932
[*] PHASE 3: Triggering Order-3 Pressure (UDP Storm).
11:08:15 | ord0:52079 ord3:0 | Free:273MB Avail:1300MB | CPU: 90.9% scan:85328881
11:08:16 | ord0:102895 ord3:0 | Free:477MB Avail:1309MB | CPU: 60.0% scan:85873777
11:08:17 | ord0:115459 ord3:5 | Free:517MB Avail:1284MB | CPU: 54.5% scan:86584389
11:08:18 | ord0:115164 ord3:0 | Free:509MB Avail:1107MB | CPU: 36.4% scan:87083561
---------------------------------------------------------------------------------------------------
The current phenomenon we observed is: Free memory is plentiful, Order-0
pages are abundant, and the network allocation has already successfully
entered the fallback-to-order-0 path. Everything seems normal on the
surface, yet kswapd remains trapped in a futile loop.
It appears that kswapd is stuck in the following logic:
wakeup_kswapd -> pgdat_balance -> __zone_watermark_ok.
Specifically, in __zone_watermark_ok():
/* For a high-order request, check at least one suitable page is free */
for (o = order; o < NR_PAGE_ORDERS; o++) {
struct free_area *area = &z->free_area[o];
int mt;
if (!area->nr_free)
continue;
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
if (!free_area_empty(area, mt))
return true;
}
}
Because our reproducer keeps creating fragmentation while the network
stack requests order-3, this loop continues to return 'false' for the
high-order requirement, even though the system is functionally fine
with order-0. To be clear, we are not intentionally creating "artificial"
fragments just for the sake of it. Rather, we designed this reproducer to
effectively stress-test and expose the existing feedback gap in the
reclaim/compaction logic—helping to pinpoint why kswapd continues
thumping CPU cycles to satisfy a watermark that the allocator has
already abandoned in favor of order-0 fallback.
A related discussion in [3] helps reduce vmpressure noise in this area.
Useful, but it does not close the contiguity gap by itself: high-order
wake/reclaim can still repeat when contiguous blocks cannot be formed.
It seems the current situation directs us to take a much closer look at
how kswapd behaves in these scenarios. After carefully reviewing
everyone's input, we believe it is time to do some targeted work on
handling these high-order page issues.
We already have some rough ideas and plan to conduct further experiments
in this area. We would appreciate a broader discussion to help address
this potential oversight that we might have collectively missed.
Links:
[1] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd_spin_repro.c
[2] https://github.com/hack-kernel-just-for-fun/kswap/blob/main/kswapd.sh
[3] https://lore.kernel.org/all/20260406195014.112521-1-jp.kobryn@xxxxxxxxx/#r
This was reproduced and cross-checked independently by our team
(Wang Lian <lianux.mm@xxxxxxxxx> and Kunwu Chan <kunwu.chan@xxxxxxxxx>).
--
Best Regards,
wang lian