Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Dan Williams (nvidia)
Date: Fri Jun 12 2026 - 19:49:47 EST
Xu Yilun wrote:
> TDX Module introduces a new concept called "TDX Module Extensions" to
> support long running / hard-irq preemptible flows inside. This makes TDX
> Module capable of handling complex tasks through "Extension SEAMCALLs".
> Adding more memory to TDX Module is the first step to enable Extensions.
Like I said on the cover, I think "long running hard-irq preemptible"
invites more questions that it answers. The service calls are not "long
running" on their own. I think it is sufficient to say they are
resumable unlike typical calls that run to completion while monopolizing
the CPU.
> Currently, TDX Module memory use is relatively static. But, the
> Extensions need to use memory more dynamically. While 'static' here
> means the kernel provides necessary amount of memory to TDX Module for
> its basic functionalities, 'dynamic' means extra memory is needed only
> if new add-on features are to be enabled. So add a new memory feeding
> process backed by a new SEAMCALL TDH.EXT.MEM.ADD.
Rick commented on this as well, but a simpler way to say it is
extensions receive a one time memory pool allocation at init time. The
extension uses that pool as its baseline for its own internal state and
data for the service APIs it offers.
> The process is mostly the same as adding PAMT. The kernel queries TDX
> Module how much memory needed, allocates it, hands it over, and never
> gets it back.
>
> TDH.EXT.MEM.ADD uses a new parameter type HPA_LIST_INFO to provide
> control (private) pages to TDX Module. This type represents a list of
> pages for TDX Module to access. It needs a 'root page' which contains
> the list of HPAs of the pages. It collapses the HPA of the root page
> and the number of valid HPAs into a 64 bit raw value for SEAMCALL
> parameters. The root page is always a medium, TDX Module never keeps
> the root page.
I mention below, but I do not think the reader cares that the TDX Module
calls an array of physical addresses a "root" page.
>
> Introduce a tdx_clflush_hpa_list() helper to flush shared cache before
> SEAMCALL, to avoid shared cache writeback damaging these private pages.
>
> For now, TDX Module Extensions consumes relatively large amount of
> memory (~50MB). Use contiguous page allocation to avoid permanently
> fragment too much memory. Print the allocation amount on TDX Module
> Extensions initialization for visibility.
To be clear I believe there is a low chance of fragmentation given this
allocation happening early. However, at 10s of MB the benefit of
isolating blocks of PFNs that will never be returned, it makes to not
use the buddy allocator for that.
> Co-developed-by: Zhenzhong Duan <zhenzhong.duan@xxxxxxxxx>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@xxxxxxxxx>
> Signed-off-by: Xu Yilun <yilun.xu@xxxxxxxxxxxxxxx>
> ---
> arch/x86/virt/vmx/tdx/tdx.h | 1 +
> arch/x86/virt/vmx/tdx/tdx.c | 118 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 119 insertions(+)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index a5eec8e3cc71..2335f88bbb10 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -46,6 +46,7 @@
> #define TDH_PHYMEM_PAGE_WBINVD 41
> #define TDH_VP_WR 43
> #define TDH_SYS_CONFIG 45
> +#define TDH_EXT_MEM_ADD 61
> #define TDH_SYS_DISABLE 69
>
> /*
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index c0c6281b08a5..622399d8da68 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -31,6 +31,7 @@
> #include <linux/syscore_ops.h>
> #include <linux/idr.h>
> #include <linux/kvm_types.h>
> +#include <linux/bitfield.h>
> #include <asm/page.h>
> #include <asm/special_insns.h>
> #include <asm/msr-index.h>
> @@ -1179,6 +1180,123 @@ static __init int init_tdmrs(struct tdmr_info_list *tdmr_list)
> return 0;
> }
>
> +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
> +{
> + u64 *entries = page_to_virt(root);
> + int i;
> +
> + for (i = 0; i < nr_pages; i++)
> + clflush_cache_range(__va(entries[i]), PAGE_SIZE);
> +}
> +
> +#define HPA_LIST_INFO_FIRST_ENTRY GENMASK_U64(11, 3)
> +#define HPA_LIST_INFO_PFN GENMASK_U64(51, 12)
> +#define HPA_LIST_INFO_LAST_ENTRY GENMASK_U64(63, 55)
> +
> +static u64 to_hpa_list_info(struct page *root, unsigned int nr_pages)
> +{
> + return FIELD_PREP(HPA_LIST_INFO_FIRST_ENTRY, 0) |
> + FIELD_PREP(HPA_LIST_INFO_PFN, page_to_pfn(root)) |
> + FIELD_PREP(HPA_LIST_INFO_LAST_ENTRY, nr_pages - 1);
> +}
> +
> +static int tdx_ext_mem_add(struct page *root, unsigned int nr_pages)
> +{
> + struct tdx_module_args args = {
> + .rcx = to_hpa_list_info(root, nr_pages),
> + };
> + u64 r;
> +
> + tdx_clflush_hpa_list(root, nr_pages);
> +
> + do {
> + /*
> + * TDH_EXT_MEM_ADD is designed to use output parameter RCX to
> + * override/update input parameter RCX, so the caller doesn't
> + * have to do manual parameter update on retry call.
> + */
> + r = seamcall_ret(TDH_EXT_MEM_ADD, &args);
> + } while (r == TDX_INTERRUPTED_RESUMABLE);
> +
> + if (r != TDX_SUCCESS)
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +static int tdx_ext_mem_setup(void)
> +{
> + unsigned int nr_pages;
> + struct page *page;
> + u64 *root;
> + unsigned int i;
> + int ret;
> +
> + nr_pages = tdx_sysinfo.ext.memory_pool_required_pages;
> + /*
> + * memory_pool_required_pages == 0 means no need to add pages,
> + * skip the memory setup.
> + */
> + if (!nr_pages)
> + return 0;
> +
> + root = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + if (!root)
> + return -ENOMEM;
I think this "root" term is a holdover from the complicated TDX Connect
case where it might sometimes be this odd "singleton" object? You could
just make it this for actual type safety.
struct tdx_hpa_list {
u64 phys[PAGE_SIZE/sizeof(u64)];
}
> +
> + page = alloc_contig_pages(nr_pages, GFP_KERNEL, numa_mem_id(),
> + &node_online_map);
> + if (!page) {
> + ret = -ENOMEM;
> + goto out_free_root;
> + }
> +
> + for (i = 0; i < nr_pages;) {
> + unsigned int nents = min(nr_pages - i,
> + PAGE_SIZE / sizeof(*root));
This looks wrong, sizeof(struct page)?, or size of physical address?
Becomes less error prone if you do:
min(nr_pages - i, ARRAY_SIZE(hpa_list->phys))
> + int j;
> +
> + for (j = 0; j < nents; j++)
You can declare j in the for loop.
> + root[j] = page_to_phys(page + i + j);
> +
> + ret = tdx_ext_mem_add(virt_to_page(root), nents);
> + /*
> + * No SEAMCALLs to reclaim the added pages. For simple error
> + * handling, leak all pages.
> + */
> + WARN_ON_ONCE(ret);
Perhaps to be friendlier to folks without the source code in front of
them drop the comment and do:
WARN(ret, "Fatal: TDX Module failed (%d) to accept memory, stranded %ld pages\n", ret, nr_pages)
...the once flavor not needed, right? It's toast at this point.