Re: "alloc_tag was not set" when running mm/ksft_hmm.sh

From: Alistair Popple

Date: Tue May 12 2026 - 03:52:01 EST


On 2026-05-12 at 16:47 +1000, "David Hildenbrand (Arm)" <david@xxxxxxxxxx> wrote...
> On 5/12/26 03:28, Alistair Popple wrote:
> > On 2026-05-12 at 02:38 +1000, Zenghui Yu <zenghui.yu@xxxxxxxxx> wrote...
> >> Hi David,
> >>
> >> On 5/11/26 8:47 PM, David Hildenbrand (Arm) wrote:
> >
> > Thanks. I have reproduced it now that my fingers are skinnier.
> >
> >>>
> >>>
> >>> zone_device_private_split_cb(), that ends up calling ->folio_split().
> >>>
> >>> We do have a call to pgalloc_tag_split() in __split_unmapped_folio(), invoked in
> >>> __folio_freeze_and_split_unmapped() before calling
> >>> zone_device_private_split_cb() when iterating the folios.
> >>
> >> If I read the code correctly, pgalloc_tag_split() in
> >> __split_unmapped_folio() deals with device private pages' alloc tag. But
> >> what alloc_tag_sub_check() warns on are real system memory pages (device
> >> page's backing page), which are allocated by
> >> dmirror_devmem_alloc_page()/folio_page().
> >>
> >> static void dmirror_devmem_folio_split(struct folio *head, struct folio
> >> *tail)
> >> {
> >> struct page *rpage = BACKING_PAGE(folio_page(head, 0));
> >>
> >> Thanks,
> >> Zenghui
> >>
> >>> The zone_device_private_split_cb(folio, NULL); is then called on the first folio
> >>> after looping over the other (new) folios.
> >>>
> >>> I would assume that __folio_freeze_and_split_unmapped() would already do the
> >>> right thing?
> >
> > Well you know what they say about assumptions :) Although in this case
> > __folio_freeze_and_split_unmapped() isn't called on the backing page anyway
> > (it's called to split the ZONE_DEVICE page, not the page simulating device
> > memory).
>
> Now my brain hurts :)

I have never liked this bit of the HMM selftests. It has always made my brain
hurt.

> > The problem is we're not splitting the tag associated with the backing
> > page for the simulated memory.
> >
> > I came up with the below fix last night, but I suspect it will quite reasonably
> > get NACKED on the basis of the symbol export so was looking at other solutions.
>
> I think there are other problems ...
>
> >
> > The simulated memory should just be used like a bare physical address range. So
> > there really is no reason for the backing page simulating device memory to be
> > allocated as a higher order folio. Using the struct page to store some metadata
> > for the simulated device is convenient though to avoid creating a test-specific
> > data structure for this. So I am looking at going back to allocating the
> > simulated backing memory as always order-0 pages in the test which is what it
> > was prior to the introduction of large device pages, but that was causing a
> > crash I'm yet to debug.
> >
>
> ... such as doing a folio_page(folio_alloc()), followed by a __free_pages().
>
> Why are we even allocating folios here and manually splitting them?
>
> Looking at dmirror_devmem_folio_split(), aren't we using folios here for
> something that ... is not a folio?
>
> Likely we really shouldn't be using folios here ... :)

Exactly my point, just more succinct :)

I just need to make it work without doing that.

- Alistair

> --
> Cheers,
>
> David