Re: [PATCH v4 07/16] x86/virt/tdx: Add tdx_alloc/free_page() helpers

From: Nikolay Borisov

Date: Thu Nov 27 2025 - 11:11:36 EST




On 21.11.25 г. 2:51 ч., Rick Edgecombe wrote:
From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>

Add helpers to use when allocating or preparing pages that need DPAMT
backing. Make them handle races internally for the case of multiple
callers trying operate on the same 2MB range simultaneously.

While the TDX initialization code in arch/x86 uses pages with 2MB
alignment, KVM will need to hand 4KB pages for it to use. Under DPAMT,
these pages will need DPAMT backing 4KB backing.

Add tdx_alloc_page() and tdx_free_page() to handle both page allocation
and DPAMT installation. Make them behave like normal alloc/free functions
where allocation can fail in the case of no memory, but free (with any
necessary DPAMT release) always succeeds. Do this so they can support the
existing TDX flows that require cleanups to succeed. Also create
tdx_pamt_put()/tdx_pamt_get() to handle installing DPAMT 4KB backing for
pages that are already allocated (such as external page tables, or S-EPT
pages).

Allocate the pages as GFP_KERNEL_ACCOUNT based on that the allocations
will be easily user triggerable.

Since the source of these pages is the page allocator, multiple TDs could
each get 4KB pages that are covered by the same 2MB range. When this
happens only one page pair needs to be installed to cover the 2MB range.
Similarly, when one page is freed, the DPAMT backing cannot be freed until
all TDX pages in the range are no longer in use. Have the helpers manage
these races internally.

So the requirements are that:

1. Free path cannot fail (i.e. no TDX module BUSY errors).
2. Allocation paths need to handle finding that DPAMT backing is already
installed, and only return an error in the case of no memory, not in the
case of losing races with other’s trying to operate on the same DPAMT
range.
3. Free paths cannot fail, and also need to clean up the DPAMT backing
when the last page in the 2MB range is no longer needed by TDX.

Previous changes allocated refcounts to be used to track how many 4KB
pages are in use by TDX for each 2MB region. So update those inside the
helpers and use them to decide when to actually install the DPAMT backing
pages.

tdx_pamt_put() needs to guarantee the DPAMT is installed before returning
so that racing threads don’t tell the TDX module to operate on the page
before it’s installed. Take a lock while adjusting the refcount and doing
the actual TDH.PHYMEM.PAMT.ADD/REMOVE to make sure these happen
atomically. The lock is heavyweight, but will be optimized in future
changes. Just do the simple solution before any complex improvements.

TDH.PHYMEM.PAMT.ADD/REMOVE take exclusive locks at the granularity each
2MB range. A simultaneous attempt to operate on the same 2MB region would
result in a BUSY error code returned from the SEAMCALL. Since the
invocation of SEAMCALLs are behind a lock, this won’t conflict.

Besides the contention between TDH.PHYMEM.PAMT.ADD/REMOVE, many other
SEAMCALLs take the same 2MB granularity locks as shared. This means any
attempt to operate on the page by the TDX module while simultaneously
doing PAMT.ADD/REMOVE will result in a BUSY error. This should not happen,
as the PAMT pages always has to be installed before giving the pages to
the TDX module anyway.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
[Add feedback, update log]
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
---
v4:
- Update tdx_find_pamt_refcount() calls to pass PFN and rely on
internal PMD bucket calculations. Based on changes in previous patch.
- Pull calculation TDX DPAMT 2MB range arg into helper.
- Fix alloc_pamt_array() doesn't zero array on allocation failure (Yan)
- Move "prealloc" comment to future patch. (Kai)
- Use union for dpamt page array. (Dave)
- Use sizeof(*args_array) everywhere instead of sizeof(u64) in some
places. (Dave)
- Fix refcount inc/dec cases. (Xiaoyao)
- Rearrange error handling in tdx_pamt_get()/tdx_pamt_put() to remove
some indented lines.
- Make alloc_pamt_array() use GFP_KERNEL_ACCOUNT like the pre-fault
path does later.

v3:
- Fix hard to follow iteration over struct members.
- Simplify the code by removing the intermediate lists of pages.
- Clear PAMT pages before freeing. (Adrian)
- Rename tdx_nr_pamt_pages(). (Dave)
- Add comments some comments, but thought the simpler code needed
less. So not as much as seem to be requested. (Dave)
- Fix asymmetry in which level of the add/remove helpers global lock is
held.
- Split out optimization.
- Write log.
- Flatten call hierarchies and adjust errors accordingly.
---
arch/x86/include/asm/shared/tdx.h | 7 +
arch/x86/include/asm/tdx.h | 8 +-
arch/x86/virt/vmx/tdx/tdx.c | 258 ++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 2 +
4 files changed, 274 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index 6a1646fc2b2f..cc2f251cb791 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -145,6 +145,13 @@ struct tdx_module_args {
u64 rsi;
};
+struct tdx_module_array_args {
+ union {
+ struct tdx_module_args args;
+ u64 args_array[sizeof(struct tdx_module_args) / sizeof(u64)];
+ };
+};
+
/* Used to communicate with the TDX module */
u64 __tdcall(u64 fn, struct tdx_module_args *args);
u64 __tdcall_ret(u64 fn, struct tdx_module_args *args);
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index cf51ccd16194..914213123d94 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -135,11 +135,17 @@ static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sysinfo)
return false; /* To be enabled when kernel is ready */
}
+void tdx_quirk_reset_page(struct page *page);
+
int tdx_guest_keyid_alloc(void);
u32 tdx_get_nr_guest_keyids(void);
void tdx_guest_keyid_free(unsigned int keyid);
-void tdx_quirk_reset_page(struct page *page);
+int tdx_pamt_get(struct page *page);
+void tdx_pamt_put(struct page *page);
+
+struct page *tdx_alloc_page(void);
+void tdx_free_page(struct page *page);
struct tdx_td {
/* TD root structure: */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index edf9182ed86d..745b308785d6 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -2009,6 +2009,264 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page)
}
EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid);
+/* Number PAMT pages to be provided to TDX module per 2M region of PA */
+static int tdx_dpamt_entry_pages(void)
+{
+ if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
+ return 0;
+
+ return tdx_sysinfo.tdmr.pamt_4k_entry_size * PTRS_PER_PTE / PAGE_SIZE;
+}

Isn't this guaranteed to return 2 always as per the ABI? Can't the allocation of the 2 pages be moved closer to where it's used - in tdh_phymem_pamt_add which will simplify things a bit?

<snip>