Re: [PATCH v4 06/16] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
From: Nikolay Borisov
Date: Wed Nov 26 2025 - 09:45:54 EST
On 21.11.25 г. 2:51 ч., Rick Edgecombe wrote:
From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
init_pamt_metadata() allocates PAMT refcounts for all physical memory up
to max_pfn. It might be suboptimal if the physical memory layout is
discontinuous and has large holes.
The refcount allocation vmalloc allocation. This is necessary to support a
nit: Something's odd with the first sentence, perhaps an "is a" before is missing before "vmalloc"?
large allocation size. The virtually contiguous property also makes it
easy to find a specific 2MB range’s refcount since it can simply be
indexed.
Since vmalloc mappings support remapping during normal kernel runtime,
switch to an approach that only populates refcount pages for the vmalloc
mapping when there is actually memory for that range. This means any holes
in the physical address space won’t use actual physical memory.
The validity of this memory optimization is based on a couple assumptions:
1. Physical holes in the ram layout are commonly large enough for it to be
worth it.
2. An alternative approach that looks the refcounts via some more layered
data structure wouldn’t overly complicate the lookups. Or at least
more than the complexity of managing the vmalloc mapping.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
[Add feedback, update log]
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
<snip>
---
arch/x86/virt/vmx/tdx/tdx.c | 136 +++++++++++++++++++++++++++++++++---
1 file changed, 125 insertions(+), 11 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index c28d4d11736c..edf9182ed86d 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -194,30 +194,135 @@ int tdx_cpu_enable(void)
}
EXPORT_SYMBOL_GPL(tdx_cpu_enable);
-/*
- * Allocate PAMT reference counters for all physical memory.
- *
- * It consumes 2MiB for every 1TiB of physical memory.
- */
-static int init_pamt_metadata(void)
+/* Find PAMT refcount for a given physical address */
+static atomic_t *tdx_find_pamt_refcount(unsigned long pfn)
{
- size_t size = DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcounts);
+ /* Find which PMD a PFN is in. */
+ unsigned long index = pfn >> (PMD_SHIFT - PAGE_SHIFT);
- if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
- return 0;
+ return &pamt_refcounts[index];
+}
- pamt_refcounts = __vmalloc(size, GFP_KERNEL | __GFP_ZERO);
- if (!pamt_refcounts)
+/* Map a page into the PAMT refcount vmalloc region */
+static int pamt_refcount_populate(pte_t *pte, unsigned long addr, void *data)
+{
+ struct page *page;
+ pte_t entry;
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
return -ENOMEM;
+ entry = mk_pte(page, PAGE_KERNEL);
+
+ spin_lock(&init_mm.page_table_lock);
+ /*
+ * PAMT refcount populations can overlap due to rounding of the
+ * start/end pfn. Make sure the PAMT range is only populated once.
+ */
+ if (pte_none(ptep_get(pte)))
+ set_pte_at(&init_mm, addr, pte, entry);
+ else
+ __free_page(page);
+ spin_unlock(&init_mm.page_table_lock);
nit: Wouldn't it be better to perform the pte_none() check before doing the allocation thus avoiding needless allocations? I.e do the alloc/mk_pte only after we are 100% sure we are going to use this entry.
+
return 0;
}
<snip>