Re: [PATCH v9 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering
From: zhen.ni
Date: Thu Jun 18 2026 - 04:56:36 EST
在 2026/6/18 15:27, Vlastimil Babka (SUSE) 写道:
On 6/17/26 10:52, zhen.ni wrote:
在 2026/5/26 03:58, Andrew Morton 写道:
On Mon, 25 May 2026 16:16:48 +0800 Zhen Ni <zhen.ni@xxxxxxxxxxxx> wrote:Hi,
This patch series introduces per-file-descriptor filtering capabilities to the
page_owner feature.
Thanks again. AI review has found a bunch of new things to get worried
about:
https://sashiko.dev/#/patchset/20260525081652.2210206-1-zhen.ni@xxxxxxxxxxxx
Can this lead to an out-of-bounds memory read?
The NUMA filter in page_owner (mm/page_owner.c:790-798) bypasses
PF_POISONED_CHECK() to avoid triggering VM_BUG_ON during concurrent page
allocation/free:
int page_nid = memdesc_nid(page->flags);
When NODE_NOT_IN_PAGE_FLAGS is defined, memdesc_nid() performs unchecked
array access:
int memdesc_nid(memdesc_flags_t mdf)
{
return section_to_node_table[memdesc_section(mdf)];
}
If page->flags is poisoned, memdesc_section() can return a garbage
section_nr that causes out-of-bounds access.
## Lockless Access Safety Principle
The page_owner iterator runs without locks, meaning pages can be
allocated or freed concurrently. The fundamental design principle should be:
"It's acceptable to skip a small number of abnormal pages, but panics
must be prevented."
In lockless iteration, TOCTOU is unavoidable - even with reference
counting or RCU, page->flags can still be modified concurrently during
access. Zone locks prevent this but are prohibitively expensive.
## Proposed Solution: Add nid to struct page_owner
Record nid at allocation time when page state is stable, eliminating the
need to extract it from page->flags during iteration:
### 1. Modify struct page_owner
struct page_owner {
unsigned short order;
short last_migrate_reason;
...
pid_t tgid;
pid_t free_pid;
pid_t free_tgid;
int nid; // NEW
};
### 2. Record nid during allocation
static inline void __update_page_owner_handle(struct page *page, ...)
{
int nid = page_to_nid(page); // Safe in allocation context
for_each_page_ext(page, 1 << order, page_ext, iter) {
page_owner = get_page_owner(page_ext);
page_owner->nid = nid;
// ... other fields ...
}
}
### 3. Use saved nid in NUMA filter
if (state->nid_filter_enabled) {
int page_nid = page_owner->nid; // Direct read, safe
if (!node_isset(page_nid, state->nid_filter)) {
spin_unlock_irqrestore(&state->lock, flags);
goto ext_put_continue;
}
}
### 4. Update nid on page migration
// In split_page_owner() when page migrates
page_owner->nid = page_to_nid(&newfolio->page);
This (presumably LLM) suggestion is a, let's say "lazy" solution to the
problem, leading to more memory usage. I'd be surprised if it's not possible
to read the nid in a way that avoids the hazards. If page_to_nid() can
trigger a VM_BUG_ON(), then I'd add a version without that VM_BUG_ON(),
handling the poisoned state gracefully - if it's poisoned, return e.g.
NUMA_NO_NODE and skip the page, or something.
The remaining two issues can also be improved. If there are no
additional comments, I will proceed with sending v10.
Thanks,
Zhen
Thank you for the review.
I'd like to clarify that the "nid" member approach was my own design
after careful consideration of alternatives, not a suggestion from
automated tools.:)
In fact, LLMs suggested approaches like "check then access" which I had
already implemented and rejected in earlier versions due to TOCTOU
issues. The key insight is that page_owner serves as a buffering layer
for struct page, eliminating lockless access inconsistency entirely.
Think about it this way:
Even with extensive checks, page_owner->handle cannot guarantee the page
won't be freed when printed. The time window between checking
page->flags and accessing page->flags is unavoidable in lockless
iteration.
By recording nid at allocation time (when page->flags is stable),
page_owner becomes a "consistent snapshot" that can be safely accessed
without locks. This is the same principle that makes page_owner work as
a debug feature in the first place - it accepts a small inconsistency
window in exchange for lockless access.
Alternative approaches (adding poison checks, bounds checking) cannot
fully eliminate TOCTOU in lockless code - they just reduce the
probability. The buffering approach is the way to provide both
safety and lockless performance.
Thanks,
Zhen