A TDX module initialization failure was reported on a Emerald Rapids
platform:
virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
virt/tdx: module initialization failed (-28)
As a step of initializing the TDX module, the kernel tells the TDX
module all the "TDX-usable memory regions" via a set of TDX architecture
defined structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB
aligned and in 1GB granularity, and all "non-TDX-usable memory holes" in
a given TDMR must be marked as a "reserved area". Each TDMR only
supports a maximum number of reserved areas reported by the TDX module.
As shown above, the root cause of this failure is when the kernel tries
to construct a TDMR to cover address range [0x0, 0x80000000), there
are too many memory holes within that range and the number of memory
holes exceeds the maximum number of reserved areas.
The E820 table of that platform (see [1] below) reflects this: the
number of memory holes among e820 "usable" entries exceeds 16, which is
the maximum number of reserved areas TDX module supports in practice.
=== Fix ===
There are two options to fix this: 1) put less memory holes as "reserved
area" when constructing a TDMR; 2) reduce the TDMR's size to cover less
memory regions, thus less memory holes.
Option 1) is possible, and in fact is easier and preferable:
TDX actually has a concept of "Convertible Memory Regions" (CMRs). TDX
reports a list of CMRs that meet TDX's security requirements on memory.
TDX requires all the "TDX-usable memory regions" that the kernel passes
to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
must be convertible memory.
In other words, if a memory hole is indeed CMR, then it's not mandatory
for the kernel to add it to the reserved areas. The number of consumed
reserved areas can be reduced if the kernel doesn't add those memory
holes as reserved area. Note this doesn't have security impact because
the kernel is out of TDX's TCB anyway.
This is feasible because in practice the CMRs just reflect the nature of
whether the RAM can indeed be used by TDX, thus each CMR tends to be a
large range w/o being split into small areas, e.g., in the way the e820
table does to contain a lot "ACPI *" entries. [2] below shows the CMRs
reported on the problematic platform (using the off-tree TDX code).
So for this particular module initialization failure, the memory holes
that are within [0x0, 0x80000000) are mostly indeed CMR. By not adding
them to the reserved areas, the number of consumed reserved areas for
the TDMR [0x0, 0x80000000) can be dramatically reduced.
On the other hand, although option 2) is also theoretically feasible, it
requires more complicated logic to handle around splitting TDMR into
smaller ones. E.g., today one memory region must be fully in one TDMR,
while splitting TDMR will result in each TDMR only covering part of some
memory region. And this also increases the total number of TDMRs, which
also cannot exceed a maximum value that TDX module supports.
Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
---
arch/x86/virt/vmx/tdx/tdx.c | 149 ++++++++++++++++++++++++++++++++----
arch/x86/virt/vmx/tdx/tdx.h | 13 ++++
2 files changed, 146 insertions(+), 16 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index ced40e3b516e..88a0c8b788b7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -293,6 +293,10 @@ static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
return 0;
}
+/* Wrapper to read one metadata field to u8/u16/u32/u64 */
+#define stbuf_read_sysmd_single(_field_id, _pdata) \
+ stbuf_read_sysmd_field(_field_id, _pdata, 0, sizeof(typeof(*(_pdata))))
+
struct field_mapping {
u64 field_id;
int offset;
@@ -349,6 +353,76 @@ static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
}
+/* Update the @cmr_info->num_cmrs to trim tail empty CMRs */
+static void trim_empty_tail_cmrs(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ u64 cmr_base = cmr_info->cmr_base[i];
+ u64 cmr_size = cmr_info->cmr_size[i];
+
+ if (!cmr_size) {
+ WARN_ON_ONCE(cmr_base);
+ break;
+ }
+
+ /* TDX architecture: CMR must be 4KB aligned */
+ WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
+ !PAGE_ALIGNED(cmr_size));
+ }
+
+ cmr_info->num_cmrs = i;
+}
+
+#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_cmr_info, _member)
+
+static int get_tdx_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i, ret;
+
+ ret = stbuf_read_sysmd_single(MD_FIELD_ID_NUM_CMRS,
+ &cmr_info->num_cmrs);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ const struct field_mapping fields[] = {
+ TD_SYSINFO_MAP_CMR_INFO(CMR_BASE0 + i, cmr_base[i]),
+ TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE0 + i, cmr_size[i]),
+ };
+
+ ret = stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields),
+ cmr_info);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * The TDX module may just report the maximum number of CMRs that
+ * TDX architecturally supports as the actual number of CMRs,
+ * despite the latter is smaller. In this case all the tail
+ * CMRs will be empty. Trim them away.
+ */
+ trim_empty_tail_cmrs(cmr_info);
+
+ return 0;
+}
+
+static void print_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+ int i;
+
+ for (i = 0; i < cmr_info->num_cmrs; i++) {
+ u64 cmr_base = cmr_info->cmr_base[i];
+ u64 cmr_size = cmr_info->cmr_size[i];
+
+ pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
+ cmr_base + cmr_size);
+ }
+}
+
static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
{
struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;