Re: [PATCH v7 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering
From: zhen.ni
Date: Tue May 19 2026 - 08:28:27 EST
在 2026/5/19 09:54, Harry Yoo 写道:
On 5/17/26 9:49 PM, Harry Yoo wrote:
On 5/15/26 6:19 PM, Zhen Ni wrote:
This patch series introduces per-file-descriptor filtering capabilities to the
page_owner feature.
Problem Statement
=================
In production environments with large memory configurations (e.g., 250GB+),
collecting page_owner information often results in files ranging from
several gigabytes to over 10GB. This creates significant challenges:
Just out of curiosity...
What are you trying to do with this data on production servers,and why the existing
per-NUMA statistics don't work for you?
If what you want to do is dump which code locations allocated how much memory for each NUMA node on the event of OOM, you probably want to improve memory allocation profiling instead of improving page_owner functionality.
IIRC there was discussion [1] on supporting memcg and numa awareness
and querying using ioctl().
[1] Memory Allocation Profiling upcoming features, LPC 2025,
https://lpc.events/event/19/contributions/2146
I want to identify which call paths allocated the memory on nearly-exhausted NUMA nodes. Existing per-NUMA statistics (such as numastat -m) cannot tell me this. For example:
# numastat -m
...
MemTotal 256923.12
MemFree 214873.33
MemUsed 42049.80
Active 9573.74
Inactive 20694.62
Active(anon) 876.71
Inactive(anon) 1161.75
Active(file) 8697.03
...
This is particularly problematic because I found that a significant portion of NIC pre-allocated ring buffers are not included in any numastat -m sub-item. This makes kernel-directly-allocated memory difficult to detect using conventional means.
The most outstanding advantage of page_owner, for me, is its ability to traverse PFNs and print information for all pages along with their allocation call stacks. This allows any memory that is difficult to detect through conventional means to be captured by page_owner.
However, the large volume of data and repeated stack printing make the output very large and difficult to analyze. This is my original motivation for introducing the filter.
Memory Allocation Profiling is designed for low-overhead code tagging in production environments, and its information is relatively aggregated. I believe that even if it implements NUMA-specific filtering, it is unlikely to include stackdepot-level information, which is relatively heavyweight.
Best regards,
Zhen Ni