Re: [PATCH v4] mm/alloc_tag: replace fixed-size early PFN array with dynamic linked list

From: Hao Ge

Date: Tue May 05 2026 - 22:25:11 EST


Hi Andrew and Suren


Sorry for the late reply, I was on holiday.


On 2026/4/30 22:52, Andrew Morton wrote:
On Thu, 30 Apr 2026 10:02:26 +0800 Hao Ge <hao.ge@xxxxxxxxx> wrote:

Pages allocated before page_ext is available have their codetag left
uninitialized. Track these early PFNs and clear their codetag in
clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set"
warnings when they are freed later.

Currently a fixed-size array of 8192 entries is used, with a warning if
the limit is exceeded. However, the number of early allocations depends
on the number of CPUs and can be larger than 8192.

Replace the fixed-size array with a dynamically allocated linked list
of pfn_pool structs. Each node is allocated via alloc_page() and mapped
to a pfn_pool containing a next pointer, an atomic slot counter, and a
PFN array that fills the remainder of the page.

The tracking pages themselves are allocated via alloc_page(), which
would trigger __pgalloc_tag_add() -> alloc_tag_add_early_pfn() and
recurse indefinitely. Introduce __GFP_NO_CODETAG (reuses the
%__GFP_NO_OBJ_EXT bit) and pass gfp_flags through pgalloc_tag_add()
so that the early path can skip recording allocations that carry this
flag.
Thanks. AI review asked a couple of questions:

https://sashiko.dev/#/patchset/20260430020226.34116-1-hao.ge@xxxxxxxxx

Yes, Sashiko is an excellent tool. Let me carefully go through the questions raised by its review.


About question 1 (About question 1 (__GFP_NO_CODETAG aliasing to __GFP_NO_OBJ_EXT may cause

SLUB page allocations to be erroneously skipped):


No. When SLUB allocates a new slab page via new_slab(),

it filters the flags with (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK),

https://elixir.bootlin.com/linux/v7.0.1/source/mm/slub.c#L3539

 which does not include __GFP_NO_OBJ_EXT. So __GFP_NO_OBJ_EXT is never passed

through to the page allocator for slab page allocations.


About question 2 (retaining caller's zone/placement flags may cause NULL

pointer dereference or violate mobility constraints):


Sashiko's analysis is correct.  As Suren also suggested, it will be fixed in

v5 by masking out GFP_ZONEMASK when allocating tracking pages,

so the internal allocation is no longer constrained by the original caller's zone

requirements.


Please check to see if there's anything legitimate there?

It also asked some different questions of the v3 patch:
https://sashiko.dev/#/patchset/20260423083756.157902-1-hao.ge@xxxxxxxxx

For the v3 patch:


About question 3 (race in the cmpxchg failure path: page_ext becomes

available between alloc_page() and __free_page(), triggering "alloc_tag was not set"):


Good catch. This race was identified during the v3 review and the fix

(calling clear_page_tag_ref() before __free_page()) was added in v4.

Apologies for not documenting it in the v4 changelogs.

CPU A (__alloc_tag_add_early_pfn)                CPU B (clear_early_alloc_pfn_tag_refs)

alloc_page() -> codetag uninitialized

                        page_ext becomes available

cmpxchg() fails

__free_page -> WARN

Setting the codetag to CODETAG_EMPTY via clear_page_tag_ref() prevents

this warning.


About question 4 (accounting leak: tracking page gets fully accounted,

then clear_page_tag_ref() overwrites to CODETAG_EMPTY, __free_page()

skips subtraction):


Yes, this can happen. However, the tracking pages are allocated with

__GFP_NO_CODETAG and are only used during early boot. The leaked

accounting is at most one page per tracking node, which is negligible

for this debug-only code path.


Thanks

Best Regards

Hao