Re: [PATCH] mm: page_ext: validate section in for_each_page_ext iterator

From: Ketan Kishore

Date: Fri Jun 19 2026 - 16:31:16 EST




On 6/19/2026 8:08 AM, Luiz Capitulino wrote:
On 2026-06-18 05:04, David Hildenbrand (Arm) wrote:
On 6/17/26 17:09, Ketan wrote:
The page_ext iteration API do not validate if the PFN still
belongs to a valid section while advancing the iterator.

When dynamically adding memory in hotplug path, it can lead
to a NULL pointer dereference during page_ext_lookup at the
boundary of the last valid section.

[   14.555124][  T846] Call trace:
[   14.555125][  T846]  lookup_page_ext+0x6c/0x108 (P)
[   14.555127][  T846]  page_ext_lookup+0x30/0x3c
[   14.555129][  T846]   +0x11c/0x260
[   14.571201][  T846]  __free_pages_ok+0x5e8/0x8e0
[   14.571204][  T846]  __free_pages_core+0x78/0xf0
[   14.571206][  T846]  generic_online_page+0x14/0x24
[   14.597782][  T846]  online_pages+0x178/0x30c
[   14.597784][  T846]  memory_block_change_state+0x284/0x32c
[   14.597787][  T846]  memory_subsys_online+0x4c/0x64
[   14.597789][  T846]  device_online+0x88/0xb0
[   14.597791][  T846]  online_memory_block+0x30/0x40
[   14.597793][  T846]  walk_memory_blocks+0xac/0xe8
[   14.597794][  T846]  add_memory_resource+0x280/0x298
[   14.656161][  T846]  add_memory+0x60/0x98

So, we do allocate the page_ext in memory_notify(MEM_GOING_ONLINE) -> ...
page_ext_callback() -> online_page_ext().

Which makes sure that all page_ext is actually allocated (for the full section).

So when we later end up in generic_online_page(), the page_ext should be there
and the memory section should be valid.


How can online_pages(), which onlines memory part of present memory sections,
online (free) memory that suddenly does not belong to a present memory section?

Something doesn't make sense here.

Ketan, would you please share the full OOPs message and maybe tell us
how you reproduce this issue?


Sure Luiz, below is the full OOPs with the call stack.
In one of our usecases, we try to add all the movable zone memory available to kernel after boot.
At the boundary of last section being added, the iterator for_each_page_ext (which in turn calls lookup_page_ext)
inside which the below fetched section is NULL for invalid pfn.
"struct mem_section *section = __pfn_to_section(pfn);
struct page_ext *page_ext = READ_ONCE(section->page_ext);" -> While accessing section for invalid PFN
we get the below NULL pointer dereference.

[   13.840511][  T846] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
[   13.850253][  T846] Mem abort info:
[   13.853859][  T846]   ESR = 0x0000000096000005
[   13.858441][  T846]   EC = 0x25: DABT (current EL), IL = 32 bits
[   13.864642][  T846]   SET = 0, FnV = 0
[   13.868510][  T846]   EA = 0, S1PTW = 0
[   13.872472][  T846]   FSC = 0x05: level 1 translation fault
[   13.878224][  T846] Data abort info:
[   13.881914][  T846]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[   13.888283][  T846]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   13.894211][  T846]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   13.900398][  T846] user pgtable: 4k pages, 39-bit VAs, pgdp=000000089ff23000
[   13.907749][  T846] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[   13.917687][  T846] Internal error: Oops: 0000000096000005 [#1]  SMP
[   13.924220][  T846] Dumping ftrace buffer:
[   13.928436][  T846]    (ftrace buffer empty)
[   14.452210][  T846] CPU: 7 UID: 0 PID: 846 Comm: modprobe Tainted: G           OE       6.18.21-android17-4-maybe-dirty-debug #1 PREEMPT  df0e65d5b71bf0ce6ff80187ad4d31404755da65
[   14.468528][  T846] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   14.484959][  T846] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   14.492820][  T846] pc : lookup_page_ext+0x6c/0x108
[   14.497840][  T846] lr : page_ext_lookup+0x30/0x3c
[   14.502766][  T846] sp : ffffffc08d21b250
[   14.506887][  T846] x29: ffffffc08d21b250 x28: ffffffe2e901f080 x27: ffffffe2e9920000
[   14.514927][  T846] x26: 0000000000000400 x25: ffffff882701a380 x24: ffffffe2e8f28000
[   14.522972][  T846] x23: 0000000030000000 x22: 0000000000c00000 x21: eaf5b162e68bf47c
[   14.531008][  T846] x20: 0000000000000000 x19: 0000000000c00000 x18: ffffffe2e8e4c380
[   14.539045][  T846] x17: 00000000c006287d x16: 00000000c006287d x15: 0000000073cc3b9a
[   14.539047][  T846] x14: 000000007c6d34b7 x13: 00000000d0fa10f2 x12: ffffff882701b128
[   14.539048][  T846] x11: ffffffe2e93630a8 x10: 0000000000000011 x9 : 0000000000000003
[   14.539050][  T846] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffffe2e68b78e4
[   14.539051][  T846] x5 : 0000000000000000
[   14.555122][  T846]  x4 : 0000000000000000 x3 : 0000000000000002
[   14.555123][  T846] x2 : 0000000000000000 x1 : ffffffe2e7f30cab x0 : fffffffeee000000
[   14.555124][  T846] Call trace:
[   14.555125][  T846]  lookup_page_ext+0x6c/0x108 (P)
[   14.555127][  T846]  page_ext_lookup+0x30/0x3c
[   14.555129][  T846]  __reset_page_owner+0x11c/0x260
[   14.571201][  T846]  __free_pages_ok+0x5e8/0x8e0
[   14.571204][  T846]  __free_pages_core+0x78/0xf0
[   14.571206][  T846]  generic_online_page+0x14/0x24
[   14.597782][  T846]  online_pages+0x178/0x30c
[   14.597784][  T846]  memory_block_change_state+0x284/0x32c
[   14.597787][  T846]  memory_subsys_online+0x4c/0x64
[   14.597789][  T846]  device_online+0x88/0xb0
[   14.597791][  T846]  online_memory_block+0x30/0x40
[   14.597793][  T846]  walk_memory_blocks+0xac/0xe8
[   14.597794][  T846]  add_memory_resource+0x280/0x298
[   14.656161][  T846]  add_memory+0x60/0x98

As David already pointed out over this thread,
"for_each_page_ext() does a "__page_ext = page_ext_iter_next(&__iter)" and the end."
page_ext_iter_next tries to access the page_ext for an invalid pfn at the end.

Thanks & Regards,
Ketan



Add a valid-section check before looking up the next page_ext
so the iterator stops cleanly at section boundaries.

Fixes: 9039b9096ea2 ("mm: page_owner: use new iteration API")

If the problem is real, we'd want to CC stable.

Signed-off-by: Ketan Kishore <ketan.kishore@xxxxxxxxxxxxxxxx>
---
  mm/page_ext.c | 12 ++++++++++++
  1 file changed, 12 insertions(+)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index e2e92bd27ebd..74067ea740fe 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -253,6 +253,18 @@ static struct page_ext *lookup_page_ext(const struct page *page)
  {
      unsigned long pfn = page_to_pfn(page);
      struct mem_section *section = __pfn_to_section(pfn);
+
+    /*
+     * section can be NULL when the page_ext iterator's for-loop increment
+     * computes a PFN one step beyond the last registered section. This
+     * occurs because pfn_to_page() uses __nr_to_section() which succeeds
+     * for unregistered sections that share a root array with registered
+     * sections,while __pfn_to_section() returns NULL for them.
+     *
+     */
+    if (!section)
+        return NULL;
+
      struct page_ext *page_ext = READ_ONCE(section->page_ext);

In addition to what Zi says, which will end up as:

    if (!section)
        return NULL;
    page_ext = READ_ONCE(section->page_ext);


drop that comment, not required. And likely best to just add a unlikely(),
because this doesn't usually happen.