[PATCH v6 3/5] x86/virt/tdx: Make module initializatiton state immutable in reboot notifier

From: Kai Huang
Date: Mon Sep 09 2024 - 04:09:10 EST


If the kernel has ever enabled TDX, part of system memory remains TDX
private memory when kexec happens. E.g., the PAMT (Physical Address
Metadata Table) pages used by the TDX module to track each TDX memory
page's state are never freed once the TDX module is initialized.

In kexec, the kernel will need to convert all TDX private pages back to
normal when the platform has the TDX "partial write machine check"
erratum. Such conversion will need to be done after stopping all remote
CPUs so that no more TDX activity can possibly happen.

Register a reboot notifier to make the TDX module initialization state
immutable during the preparation phase of kexec, so that the kernel can
later use module state to determine whether it is possible for the
system to have any TDX private page. Otherwise, the remote CPU could be
stopped when it is in the middle of module initialization and the module
state wouldn't be able to reflect this.

Specifically, upon receiving the reboot notifier, stop further module
initialization if the kernel hasn't enabled TDX yet. If there's any
other thread trying to initialize TDX module, wait until the ongoing
module initialization to finish.

The reboot notifier is triggered when the kernel goes to reboot, kexec,
halt or shutdown. In any case, there's no need to allow the kernel to
continue to initialize the TDX module anyway (if not done yet).

Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx>
---

v5 -> v6:
- No change

v4 -> v5:
- New patch to split the 'tdx_rebooting' around reboot notifier (Dave).


---
arch/x86/virt/vmx/tdx/tdx.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4e2b2e2ac9f9..c33417fe4086 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -27,6 +27,7 @@
#include <linux/log2.h>
#include <linux/acpi.h>
#include <linux/suspend.h>
+#include <linux/reboot.h>
#include <asm/page.h>
#include <asm/special_insns.h>
#include <asm/msr-index.h>
@@ -52,6 +53,8 @@ static DEFINE_MUTEX(tdx_module_lock);
/* All TDX-usable memory regions. Protected by mem_hotplug_lock. */
static LIST_HEAD(tdx_memlist);

+static bool tdx_rebooting;
+
typedef void (*sc_err_func_t)(u64 fn, u64 err, struct tdx_module_args *args);

static inline void seamcall_err(u64 fn, u64 err, struct tdx_module_args *args)
@@ -1185,6 +1188,9 @@ static int __tdx_enable(void)
{
int ret;

+ if (tdx_rebooting)
+ return -EINVAL;
+
ret = init_tdx_module();
if (ret) {
pr_err("module initialization failed (%d)\n", ret);
@@ -1418,6 +1424,21 @@ static struct notifier_block tdx_memory_nb = {
.notifier_call = tdx_memory_notifier,
};

+static int tdx_reboot_notifier(struct notifier_block *nb, unsigned long mode,
+ void *unused)
+{
+ /* Wait for ongoing TDX initialization to finish */
+ mutex_lock(&tdx_module_lock);
+ tdx_rebooting = true;
+ mutex_unlock(&tdx_module_lock);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block tdx_reboot_nb = {
+ .notifier_call = tdx_reboot_notifier,
+};
+
static void __init check_tdx_erratum(void)
{
/*
@@ -1472,6 +1493,14 @@ void __init tdx_init(void)
return;
}

+ err = register_reboot_notifier(&tdx_reboot_nb);
+ if (err) {
+ pr_err("initialization failed: register_reboot_notifier() failed (%d)\n",
+ err);
+ unregister_memory_notifier(&tdx_memory_nb);
+ return;
+ }
+
#if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
pr_info("Disable ACPI S3. Turn off TDX in the BIOS to use ACPI S3.\n");
acpi_suspend_lowlevel = NULL;
--
2.46.0