Re: [PATCH 03/49] mm/sparse: fix vmemmap page accounting for HVOed DAX

From: Muchun Song

Date: Mon Apr 13 2026 - 22:34:16 EST




> On Apr 14, 2026, at 02:41, David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
>
> On 4/5/26 14:51, Muchun Song wrote:
>> When HVO is enabled for DAX, the vmemmap page accounting is wrong since
>> it only accounts for non-HVO case.
>>
>> Fix the accounting by introducing section_vmemmap_pages() that returns
>> the exact number of vmemmap pages needed for the given pfn range.
>>
>> Fixes: 15995a352474 ("mm: report per-page metadata information")
>> Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
>> ---
>> mm/sparse-vmemmap.c | 30 ++++++++++++++++++++++++++----
>> 1 file changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
>> index 7aa9a97498eb..0ef96b1afbcc 100644
>> --- a/mm/sparse-vmemmap.c
>> +++ b/mm/sparse-vmemmap.c
>> @@ -724,6 +724,27 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>> return rc;
>> }
>>
>> +static int __meminit section_vmemmap_pages(unsigned long pfn, unsigned long nr_pages,
>> + struct vmem_altmap *altmap, struct dev_pagemap *pgmap)
>> +{
>> + unsigned int order = pgmap ? pgmap->vmemmap_shift : 0;
>> + unsigned long pages_per_compound = 1L << order;
>> +
>> + VM_BUG_ON(!IS_ALIGNED(pfn | nr_pages, min(pages_per_compound, PAGES_PER_SECTION)));
>> + VM_BUG_ON(pfn_to_section_nr(pfn) != pfn_to_section_nr(pfn + nr_pages - 1));
>> +
>> + if (!vmemmap_can_optimize(altmap, pgmap))
>> + return DIV_ROUND_UP(nr_pages * sizeof(struct page), PAGE_SIZE);
>> +
>> + if (order < PFN_SECTION_SHIFT)
>> + return VMEMMAP_RESERVE_NR * nr_pages / pages_per_compound;
>> +
>> + if (IS_ALIGNED(pfn, pages_per_compound))
>> + return VMEMMAP_RESERVE_NR;
>
> This makes me wonder whether that is really the right place to update
> the counter. Can't we update the counter where we actually allocate/free
> the pages, so we don't have to re-calculate that?

The vmemmap pages are allocated via vmemmap_populate(). Different
architectures may implement their own versions of vmemmap_populate()
(though this is relatively rare), leading to the accounting being
scattered across various locations. As for vmemmap_free(), there is
no generic implementation; since each architecture implements its
own version, the accounting would be dispersed across different
architectures, making the implementation appear quite complex.

Thanks,
Muhcun

>
> --
> Cheers,
>
> David