Re: "alloc_tag was not set" when running mm/ksft_hmm.sh

From: David Hildenbrand (Arm)

Date: Tue May 12 2026 - 02:48:14 EST


On 5/12/26 03:28, Alistair Popple wrote:
> On 2026-05-12 at 02:38 +1000, Zenghui Yu <zenghui.yu@xxxxxxxxx> wrote...
>> Hi David,
>>
>> On 5/11/26 8:47 PM, David Hildenbrand (Arm) wrote:
>
> Thanks. I have reproduced it now that my fingers are skinnier.
>
>>>
>>>
>>> zone_device_private_split_cb(), that ends up calling ->folio_split().
>>>
>>> We do have a call to pgalloc_tag_split() in __split_unmapped_folio(), invoked in
>>> __folio_freeze_and_split_unmapped() before calling
>>> zone_device_private_split_cb() when iterating the folios.
>>
>> If I read the code correctly, pgalloc_tag_split() in
>> __split_unmapped_folio() deals with device private pages' alloc tag. But
>> what alloc_tag_sub_check() warns on are real system memory pages (device
>> page's backing page), which are allocated by
>> dmirror_devmem_alloc_page()/folio_page().
>>
>> static void dmirror_devmem_folio_split(struct folio *head, struct folio
>> *tail)
>> {
>> struct page *rpage = BACKING_PAGE(folio_page(head, 0));
>>
>> Thanks,
>> Zenghui
>>
>>> The zone_device_private_split_cb(folio, NULL); is then called on the first folio
>>> after looping over the other (new) folios.
>>>
>>> I would assume that __folio_freeze_and_split_unmapped() would already do the
>>> right thing?
>
> Well you know what they say about assumptions :) Although in this case
> __folio_freeze_and_split_unmapped() isn't called on the backing page anyway
> (it's called to split the ZONE_DEVICE page, not the page simulating device
> memory).

Now my brain hurts :)

> The problem is we're not splitting the tag associated with the backing
> page for the simulated memory.
>
> I came up with the below fix last night, but I suspect it will quite reasonably
> get NACKED on the basis of the symbol export so was looking at other solutions.

I think there are other problems ...

>
> The simulated memory should just be used like a bare physical address range. So
> there really is no reason for the backing page simulating device memory to be
> allocated as a higher order folio. Using the struct page to store some metadata
> for the simulated device is convenient though to avoid creating a test-specific
> data structure for this. So I am looking at going back to allocating the
> simulated backing memory as always order-0 pages in the test which is what it
> was prior to the introduction of large device pages, but that was causing a
> crash I'm yet to debug.
>

... such as doing a folio_page(folio_alloc()), followed by a __free_pages().

Why are we even allocating folios here and manually splitting them?

Looking at dmirror_devmem_folio_split(), aren't we using folios here for
something that ... is not a folio?

Likely we really shouldn't be using folios here ... :)

--
Cheers,

David