Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation

From: Eric Dumazet

Date: Mon Oct 13 2025 - 14:54:10 EST


On Mon, Oct 13, 2025 at 3:16 AM Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> From: Barry Song <v-songbaohua@xxxxxxxx>
>
> On phones, we have observed significant phone heating when running apps
> with high network bandwidth. This is caused by the network stack frequently
> waking kswapd for order-3 allocations. As a result, memory reclamation becomes
> constantly active, even though plenty of memory is still available for network
> allocations which can fall back to order-0.
>
> Commit ce27ec60648d ("net: add high_order_alloc_disable sysctl/static key")
> introduced high_order_alloc_disable for the transmit (TX) path
> (skb_page_frag_refill()) to mitigate some memory reclamation issues,
> allowing the TX path to fall back to order-0 immediately, while leaving the
> receive (RX) path (__page_frag_cache_refill()) unaffected. Users are
> generally unaware of the sysctl and cannot easily adjust it for specific use
> cases. Enabling high_order_alloc_disable also completely disables the
> benefit of order-3 allocations. Additionally, the sysctl does not apply to the
> RX path.
>
> An alternative approach is to disable kswapd for these frequent
> allocations and provide best-effort order-3 service for both TX and RX paths,
> while removing the sysctl entirely.
>
>
...

> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> ---
> Documentation/admin-guide/sysctl/net.rst | 12 ------------
> include/net/sock.h | 1 -
> mm/page_frag_cache.c | 2 +-
> net/core/sock.c | 8 ++------
> net/core/sysctl_net_core.c | 7 -------
> 5 files changed, 3 insertions(+), 27 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> index 2ef50828aff1..b903bbae239c 100644
> --- a/Documentation/admin-guide/sysctl/net.rst
> +++ b/Documentation/admin-guide/sysctl/net.rst
> @@ -415,18 +415,6 @@ GRO has decided not to coalesce, it is placed on a per-NAPI list. This
> list is then passed to the stack when the number of segments reaches the
> gro_normal_batch limit.
>
> -high_order_alloc_disable
> -------------------------
> -
> -By default the allocator for page frags tries to use high order pages (order-3
> -on x86). While the default behavior gives good results in most cases, some users
> -might have hit a contention in page allocations/freeing. This was especially
> -true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
> -lists. This allows to opt-in for order-0 allocation instead but is now mostly of
> -historical importance.
> -

The sysctl is quite useful for testing purposes, say on a freshly
booted host, with plenty of free memory.

Also, having order-3 pages if possible is quite important for IOMM use cases.

Perhaps kswapd should have some kind of heuristic to not start if a
recent run has already happened.

I am guessing phones do not need to send 1.6 Tbit per second on
network devices (yet),
an option could be to disable it in your boot scripts.