Re: [PATCH v9 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering
From: zhen.ni
Date: Tue Jun 23 2026 - 09:19:31 EST
在 2026/6/23 14:52, Vlastimil Babka (SUSE) 写道:
On 6/18/26 10:15, zhen.ni wrote:Hi,
在 2026/6/18 15:27, Vlastimil Babka (SUSE) 写道:
On 6/17/26 10:52, zhen.ni wrote:
在 2026/5/26 03:58, Andrew Morton 写道:
On Mon, 25 May 2026 16:16:48 +0800 Zhen Ni <zhen.ni@xxxxxxxxxxxx> wrote:Hi,
This patch series introduces per-file-descriptor filtering capabilities to the
page_owner feature.
Thanks again. AI review has found a bunch of new things to get worried
about:
https://sashiko.dev/#/patchset/20260525081652.2210206-1-zhen.ni@xxxxxxxxxxxx
Can this lead to an out-of-bounds memory read?
The NUMA filter in page_owner (mm/page_owner.c:790-798) bypasses
PF_POISONED_CHECK() to avoid triggering VM_BUG_ON during concurrent page
allocation/free:
int page_nid = memdesc_nid(page->flags);
When NODE_NOT_IN_PAGE_FLAGS is defined, memdesc_nid() performs unchecked
array access:
int memdesc_nid(memdesc_flags_t mdf)
{
return section_to_node_table[memdesc_section(mdf)];
}
If page->flags is poisoned, memdesc_section() can return a garbage
section_nr that causes out-of-bounds access.
## Lockless Access Safety Principle
The page_owner iterator runs without locks, meaning pages can be
allocated or freed concurrently. The fundamental design principle should be:
"It's acceptable to skip a small number of abnormal pages, but panics
must be prevented."
In lockless iteration, TOCTOU is unavoidable - even with reference
counting or RCU, page->flags can still be modified concurrently during
access. Zone locks prevent this but are prohibitively expensive.
## Proposed Solution: Add nid to struct page_owner
Record nid at allocation time when page state is stable, eliminating the
need to extract it from page->flags during iteration:
### 1. Modify struct page_owner
struct page_owner {
unsigned short order;
short last_migrate_reason;
...
pid_t tgid;
pid_t free_pid;
pid_t free_tgid;
int nid; // NEW
};
### 2. Record nid during allocation
static inline void __update_page_owner_handle(struct page *page, ...)
{
int nid = page_to_nid(page); // Safe in allocation context
for_each_page_ext(page, 1 << order, page_ext, iter) {
page_owner = get_page_owner(page_ext);
page_owner->nid = nid;
// ... other fields ...
}
}
### 3. Use saved nid in NUMA filter
if (state->nid_filter_enabled) {
int page_nid = page_owner->nid; // Direct read, safe
if (!node_isset(page_nid, state->nid_filter)) {
spin_unlock_irqrestore(&state->lock, flags);
goto ext_put_continue;
}
}
### 4. Update nid on page migration
// In split_page_owner() when page migrates
page_owner->nid = page_to_nid(&newfolio->page);
This (presumably LLM) suggestion is a, let's say "lazy" solution to the
problem, leading to more memory usage. I'd be surprised if it's not possible
to read the nid in a way that avoids the hazards. If page_to_nid() can
trigger a VM_BUG_ON(), then I'd add a version without that VM_BUG_ON(),
handling the poisoned state gracefully - if it's poisoned, return e.g.
NUMA_NO_NODE and skip the page, or something.
The remaining two issues can also be improved. If there are no
additional comments, I will proceed with sending v10.
Thanks,
Zhen
Thank you for the review.
I'd like to clarify that the "nid" member approach was my own design
after careful consideration of alternatives, not a suggestion from
automated tools.:)
Alright, good :)
In fact, LLMs suggested approaches like "check then access" which I had
already implemented and rejected in earlier versions due to TOCTOU
issues. The key insight is that page_owner serves as a buffering layer
for struct page, eliminating lockless access inconsistency entirely.
Sure, but is the elimination necessary if it has a memory cost?
Think about it this way:
Even with extensive checks, page_owner->handle cannot guarantee the page
won't be freed when printed. The time window between checking
page->flags and accessing page->flags is unavoidable in lockless
iteration.
Of course, but that's an argument saying that page owner as a whole can't be
a perfect snapshot anyway. Then it follows that "nid" snapshot doesn't need
to be perfect either.
By recording nid at allocation time (when page->flags is stable),
page_owner becomes a "consistent snapshot" that can be safely accessed
without locks. This is the same principle that makes page_owner work as
a debug feature in the first place - it accepts a small inconsistency
window in exchange for lockless access.
And that means we can accept some more small inconsistency for nid, without
occupying more memory.
Alternative approaches (adding poison checks, bounds checking) cannot
fully eliminate TOCTOU in lockless code - they just reduce the
probability. The buffering approach is the way to provide both
safety and lockless performance.
Please explain how the following "page_to_nid_robust()" (or similar name)
code cannot eliminate the out of bounds accesses completely. The section/nid
in page flags is possibly the most stable part of the whole struct page, it
doesn't change as page is allocated, freed, or how it's used. The only
problem is the poison.
memdesc_flags_t flags = READ_ONCE(page->flags);
// our flags variable is no longer subject to TOCTOU
if (flags.f == PAGE_POISON_PATTERN)
return NUMA_NO_NODE;
return memdesc_nid(flags);
-
Thanks,
Zhen
I agree that saving page->flags to a local variable can prevent TOCTOU. Both the nid member and page_to_nid_robust approaches are functionally equivalent, but the memory cost of the nid member is real.
Here's how I'm handling this logic:
if (state->nid_filter_enabled) {
int nid;
memdesc_flags_t page_flags = READ_ONCE(page->flags);
if (page_flags.f == PAGE_POISON_PATTERN) {
spin_unlock_irqrestore(&state->lock, flags);
goto ext_put_continue;
}
nid = memdesc_nid(page_flags);
if (!node_isset(nid, state->nid_filter)) {
spin_unlock_irqrestore(&state->lock, flags);
goto ext_put_continue;
}
}
This might not be suited to be a separate public function - here we
just need to check if the page is poisoned to prevent crashes. If it
were a public function, it might need to check if the page exists
before getting nid.
If you have further suggestions, please let me know.
Thanks,
Zhen