TDX guests allocate shared buffers to perform I/O. It is done by<snip>
allocating pages normally from the buddy allocator and converting them
to shared with set_memory_decrypted().
The second kernel has no idea what memory is converted this way. It only
sees E820_TYPE_RAM.
Accessing shared memory via private mapping is fatal. It leads to
unrecoverable TD exit.
On kexec walk direct mapping and convert all shared memory back to
private. It makes all RAM private again and second kernel may use it
normally.
The conversion occurs in two steps: stopping new conversions and
unsharing all memory. In the case of normal kexec, the stopping of
conversions takes place while scheduling is still functioning. This
allows for waiting until any ongoing conversions are finished. The
second step is carried out when all CPUs except one are inactive and
interrupts are disabled. This prevents any conflicts with code that may
access shared memory.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
---
arch/x86/coco/tdx/tdx.c | 124 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 122 insertions(+), 2 deletions(-)
+static void tdx_kexec_stop_conversion(bool crash)In case of SNP and crash/kdump case, we need to prevent the boot_ghcb being converted to shared (in snp_kexec_unshare_mem()) as the boot_ghcb is required to handle all I/O for disabling IO-APIC/lapic, hpet, etc., as the enc_kexec_unshare_mem() callback is invoked before the apics, hpet, etc. are disabled.
+{
+ /* Stop new private<->shared conversions */
+ conversion_allowed = false;
+
+ /*
+ * Make sure conversion_allowed is cleared before checking
+ * conversions_in_progress.
+ */
+ barrier();
+
+ /*
+ * Crash kernel reaches here with interrupts disabled: can't wait for
+ * conversions to finish.
+ *
+ * If race happened, just report and proceed.
+ */
+ if (!crash) {
+ unsigned long timeout;
+
+ /*
+ * Wait for in-flight conversions to complete.
+ *
+ * Do not wait more than 30 seconds.
+ */
+ timeout = 30 * USEC_PER_SEC;
+ while (atomic_read(&conversions_in_progress) && timeout--)
+ udelay(1);
+ }
+
+ if (atomic_read(&conversions_in_progress))
+ pr_warn("Failed to finish shared<->private conversions\n");
+}
+
+static void tdx_kexec_unshare_mem(void)
+{
+ unsigned long addr, end;
+ long found = 0, shared;
+
+ /*
+ * Walk direct mapping and convert all shared memory back to private,
+ */
+
+ addr = PAGE_OFFSET;
+ end = PAGE_OFFSET + get_max_mapped();
+
+ while (addr < end) {
+ unsigned long size;
+ unsigned int level;
+ pte_t *pte;
+
+ pte = lookup_address(addr, &level);
+ size = page_level_size(level);
+
+ if (pte && pte_decrypted(*pte)) {
+ int pages = size / PAGE_SIZE;
+
+ /*
+ * Touching memory with shared bit set triggers implicit
+ * conversion to shared.
+ *
+ * Make sure nobody touches the shared range from
+ * now on.
+ */
+ set_pte(pte, __pte(0));
+
+ if (!tdx_enc_status_changed(addr, pages, true)) {
+ pr_err("Failed to unshare range %#lx-%#lx\n",
+ addr, addr + size);
+ }
+
+ found += pages;
+ }
+
+ addr += size;
+ }
+
+ __flush_tlb_all();
+
+ shared = atomic_long_read(&nr_shared);
+ if (shared != found) {
+ pr_err("shared page accounting is off\n");
+ pr_err("nr_shared = %ld, nr_found = %ld\n", shared, found);
+ }
+}