[RFC PATCH 00/67] KVM: X86: TDX support
From: isaku . yamahata
Date: Mon Nov 16 2020 - 13:28:15 EST
From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
* What's TDX?
TDX stands for Trust Domain Extensions which isolates VMs from
the virtual-machine manager (VMM)/hypervisor and any other software on
the platform. [1]
For details, the specifications, [2], [3], [4], [5], [6], [7], are
available.
* The goal of this RFC patch
The purpose of this post is to get feedback early on high level design
issue of KVM enhancement for TDX. The detailed coding (variable naming
etc) is not cared of. This patch series is incomplete (not working).
Although multiple software components, not only KVM but also QEMU,
guest Linux and virtual bios, need to be updated, this includes only
KVM VMM part. For those who are curious to changes to other
component, there are public repositories at github. [8], [9]
* Terminology
Here are short explanations of key concepts.
For detailed explanation or other terminologies, please refer to the
specifications. [2], [3], [4], [5], [6], [7].
- Trusted Domain(TD)
Hardware-isolated virtual machines managed by TDX-module.
- Secure-Arbitration Mode(SEAM)
A new mode of the CPU. It consists of SEAM Root and SEAM Non-Root
which corresponds to VMX Root and VMX Non-Root.
- TDX-module
TDX-module runs in SEAM Root that manages TD guest state.
It provides ABI for VMM to manages TDs. It's expensive operation.
- SEAM loader(SEAMLDR)
Authenticated Code Module(ACM) to load the TDX-module.
- Secure EPT (S-EPT)
An extended Page table that is encrypted.
Shared bit(bit 51 or 47) in GPA selects shared vs private.
0: private to TD, 1: shared with host VMM.
* Major touch/discussion points
The followings are the major touch points where feedback is wanted.
** the file location of the boot code
BSP launches SEAM Loader on BSP to load TDX module. TDX module is on
all CPUs. The directory, arch/x86/kvm/boot/seam, is chosen to locate
the related files in near directory. When maintenance/enhancement in
future, it will be easy to identify that they're related to be synced
with.
- arch/x86/kvm/boot/seam: the current choice
Pros:
- The directory clearly indicates that the code is related to only
KVM.
- Keep files near to the related code (KVM TDX code).
Cons:
- It doesn't follow the existing convention.
Alternative:
The alternative is to follow the existing convention.
- arch/x86/kernel/cpu/
Pros:
- It follows the existing convention.
Cons:
- It's unclear that it's related to only KVM TDX.
- drivers/firmware/
As TDX module can be considered a firmware, yet other choice is
Pros:
- It follows the existing convention. it clarifies that TDX module
is a firmware.
Cons:
- It's hard to understand the firmware is only for KVM TDX.
- The files are far from the related code(KVM TDX).
** Coexistence of normal(VMX) VM and TD VM
It's required to allow both legacy(normal VMX) VMs and new TD VMs to
coexist. Otherwise the benefits of VM flexibility would be eliminated.
The main issue for it is that the logic of kvm_x86_ops callbacks for
TDX is different from VMX. On the other hand, the variable,
kvm_x86_ops, is global single variable. Not per-VM, not per-vcpu.
Several points to be considered.
. No or minimal overhead when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).
. Avoid overhead of indirect call via function pointers.
. Contain the changes under arch/x86/kvm/vmx directory and share logic
with VMX for maintenance.
Even though the ways to operation on VM (VMX instruction vs TDX
SEAM call) is different, the basic idea remains same. So, many
logic can be shared.
. Future maintenance
The huge change of kvm_x86_ops in (near) future isn't expected.
a centralized file is acceptable.
- Wrapping kvm x86_ops: The current choice
Introduce dedicated file for arch/x86/kvm/vmx/main.c (the name,
main.c, is just chosen to show main entry points for callbacks.) and
wrapper functions around all the callbacks with
"if (is-tdx) tdx-callback() else vmx-callback()".
Pros:
- No major change in common x86 KVM code. The change is (mostly)
contained under arch/x86/kvm/vmx/.
- When TDX is disabled(CONFIG_KVM_INTEL_TDX=n), the overhead is
optimized out.
- Micro optimization by avoiding function pointer.
Cons:
- Many boiler plates in arch/x86/kvm/vmx/main.c.
Alternative:
- Introduce another callback layer under arch/x86/kvm/vmx.
Pros:
- No major change in common x86 KVM code. The change is (mostly)
contained under arch/x86/kvm/vmx/.
- clear separation on callbacks.
Cons:
- overhead in VMX even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).
- Allow per-VM kvm_x86_ops callbacks instead of global kvm_x86_ops
Pros:
- clear separation on callbacks.
Cons:
- Big change in common x86 code.
- overhead in common code even when TDX is
disabled(CONFIG_KVM_INTEL_TDX=n).
- Introduce new directory arch/x86/kvm/tdx
Pros:
- It clarifies that TDX is different from VMX.
Cons:
- Given the level of code sharing, it complicates code sharing.
** KVM MMU Changes
KVM MMU needs to be enhanced to handle Secure/Shared-EPT. The
high-level execution flow is mostly same to normal EPT case.
EPT violation/misconfiguration -> invoke TDP fault handler ->
resolve TDP fault -> resume execution. (or emulate MMIO)
The difference is, that S-EPT is operated(read/write) via TDX SEAM
call which is expensive instead of direct read/write EPT entry.
One bit of GPA (51 or 47 bit) is repurposed so that it means shared
with host(if set to 1) or private to TD(if cleared to 0).
- The current implementation
. Reuse the existing MMU code with minimal update. Because the
execution flow is mostly same. But additional operation, TDX call
for S-EPT, is needed. So add hooks for it to kvm_x86_ops.
. For performance, minimize TDX SEAM call to operate on S-EPT. When
getting corresponding S-EPT pages/entry from faulting GPA, don't
use TDX SEAM call to read S-EPT entry. Instead create shadow copy
in host memory.
Repurpose the existing kvm_mmu_page as shadow copy of S-EPT and
associate S-EPT to it.
. Treats share bit as attributes. mask/unmask the bit where
necessary to keep the existing traversing code works.
Introduce kvm.arch.gfn_shared_mask and use "if (gfn_share_mask)"
for special case.
= 0 : for non-TDX case
= 51 or 47 bit set for TDX case.
Pros:
- Large code reuse with minimal new hooks.
- Execution path is same.
Cons:
- Complicates the existing code.
- Repurpose kvm_mmu_page as shadow of Secure-EPT can be confusing.
Alternative:
- Replace direct read/write on EPT entry with TDX-SEAM call by
introducing callbacks on EPT entry.
Pros:
- Straightforward.
Cons:
- Too many touching point.
- Too slow due to TDX-SEAM call.
- Overhead even when TDX is disabled(CONFIG_KVM_INTEL_TDX=n).
- Sprinkle "if (is-tdx)" for TDX special case
Pros:
- Straightforward.
Cons:
- The result is non-generic and ugly.
- Put TDX specific logic into common KVM MMU code.
** New KVM API, ioctl (sub)command, to manage TD VMs
Additional KVM API are needed to control TD VMs. The operations on TD
VMs are specific to TDX.
- Piggyback and repurpose KVM_MEMORY_ENCRYPT_OP
Although not all operation isn't memory encryption, repupose to get
TDX specific ioctls.
Pros:
- No major change in common x86 KVM code.
Cons:
- The operations aren't actually memory encryption, but operations
on TD VMs.
Alternative:
- Introduce new ioctl for guest protection like
KVM_GUEST_PROTECTION_OP and introduce subcommand for TDX.
Pros:
- Clean name.
Cons:
- One more new ioctl for guest protection.
- Confusion with KVM_MEMORY_ENCRYPT_OP with KVM_GUEST_PROTECTION_OP.
- Rename KVM_MEMORY_ENCRYPT_OP to KVM_GUEST_PROTECTION_OP and keep
KVM_MEMORY_ENCRYPT_OP as same value for user API for compatibility.
"#define KVM_MEMORY_ENCRYPT_OP KVM_GUEST_PROTECTION_OP" for uapi
compatibility.
Pros:
- No new ioctl with more suitable name.
Cons:
- May cause confusion to the existing user program.
* Items unsupported/out of the scope
Those items are unsupported at the moment or out of the scope.
- Large page(2MB, 1GB) support
- Page migration
- Debugger support(qemu gdb stub)
- Removing user space(qemu) mapping of guest private memory
Because this topic itself is big and will take time, the effort is
taking place independently. [12]
- Attestation
The end-to-end integration is required.
- Live migration
TDX 1.0 doesn't support this.
- Nested virtualization
TDX 1.0 doesn't support this.
* Related repositories
TDX enabling software are composed of several components. Not only
KVM/x86 enablement, but also other components. There are several
publicly available repositories in github. Those are not complete, not
working, but only for reference for those who are curious.
- TDX host/guest [8]
- TDX Virtual Firmware [9]
- qemu change isn't published (yet).
* Related presentations
At KVM forum 2020, several presentation related to TDX were given. [10] [11]
They are helpful to understand TDX and KVM/qemu related changes.
* Patch organization
The main changes are only 2 patches(62 and 64).
The preceding patches(01-61) are refactoring the code and introducing
additional hooks. The patch 64 plugs hooks into TDX implementation.
- patch 01-16: They are preparations. introduce architecture
constants, code refactoring, export symbols for
following patches.
- patch 17-33: start to introduce the new type of VM and allow the
coexistence of multiple type of VM. allow/disallow KVM
ioctl where appropriate. Especially make per-system
ioctl to per-VM ioctl.
- patch 34-43: refactoring KVM MMU and adding new hooks for Secure
EPT.
- patch 44-48: refactoring KVM/VMX code + wrapper for kvm_x86_ops for
VMX and TDX.
- patch 52-61: introducing TDX architectural constants/structures and
helper functions.
- patch 62-63: load/init TDX module during boot.
- patch 64-65: main patch to add "basic" support for building/running
TDX.
- patch 66 : This patch is not for review, but to make build success.
[1] TDX specification
https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
[2] Intel Trust Domain Extensions (Intel TDX)
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-whitepaper-final9-17.pdf
[3] Intel CPU Architectural Extensions Specification
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-cpu-architectural-specification.pdf
[4] Intel TDX Module 1.0 EAS
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf
[5] Intel TDX Loader Interface Specification
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-seamldr-interface-specification.pdf
[6] Intel TDX Guest-Hypervisor Communication Interface
https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-guest-hypervisor-communication-interface.pdf
[7] Intel TDX Virtual Firmware Design Guide
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.
[8] intel public github
kvm TDX branch: https://github.com/intel/tdx/tree/kvm
TDX guest branch: https://github.com/intel/tdx/tree/guest
[9] tdvf
https://github.com/tianocore/edk2-staging/tree/TDVF
[10] KVM forum 2020: Intel Virtualization Technology Extensions to
Enable Hardware Isolated VMs
https://osseu2020.sched.com/event/eDzm/intel-virtualization-technology-extensions-to-enable-hardware-isolated-vms-sean-christopherson-intel
[11] Linux Security Summit EU 2020:
Architectural Extensions for Hardware Virtual Machine Isolation
to Advance Confidential Computing in Public Clouds - Ravi Sahita
& Jun Nakajima, Intel Corporation
https://osseu2020.sched.com/event/eDOx/architectural-extensions-for-hardware-virtual-machine-isolation-to-advance-confidential-computing-in-public-clouds-ravi-sahita-jun-nakajima-intel-corporation
[12] [RFCv2,00/16] KVM protected memory extension
https://lkml.org/lkml/2020/10/20/66
Isaku Yamahata (4):
KVM: x86: Make KVM_CAP_X86_SMM a per-VM capability
KVM: Add per-VM flag to mark read-only memory as unsupported
fixup! KVM: TDX: Add "basic" support for building and running Trust
Domains
KVM: X86: not for review: add dummy file for TDX-SEAM module
Kai Huang (3):
KVM: x86: Add per-VM flag to disable in-kernel I/O APIC and level
routes
KVM: TDX: Add SEAMRR related MSRs macro definition
cpu/hotplug: Document that TDX also depends on booting CPUs once
Rick Edgecombe (1):
KVM: x86: Add infrastructure for stolen GPA bits
Sean Christopherson (58):
x86/cpufeatures: Add synthetic feature flag for TDX (in host)
x86/msr-index: Define MSR_IA32_MKTME_KEYID_PART used by TDX
KVM: Export kvm_io_bus_read for use by TDX for PV MMIO
KVM: Enable hardware before doing arch VM initialization
KVM: x86: Split core of hypercall emulation to helper function
KVM: x86: Export kvm_mmio tracepoint for use by TDX for PV MMIO
KVM: x86/mmu: Zap only leaf SPTEs for deleted/moved memslot by default
KVM: Add infrastructure and macro to mark VM as bugged
KVM: Export kvm_make_all_cpus_request() for use in marking VMs as
bugged
KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the
VM
KVM: x86/mmu: Mark VM as bugged if page fault returns RET_PF_INVALID
KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd()
KVM: Add max_vcpus field in common 'struct kvm'
KVM: x86: Add vm_type to differentiate legacy VMs from protected VMs
KVM: x86: Hoist kvm_dirty_regs check out of sync_regs()
KVM: x86: Introduce "protected guest" concept and block disallowed
ioctls
KVM: x86: Add per-VM flag to disable direct IRQ injection
KVM: x86: Add flag to disallow #MC injection / KVM_X86_SETUP_MCE
KVM: x86: Add flag to mark TSC as immutable (for TDX)
KVM: Add per-VM flag to disable dirty logging of memslots for TDs
KVM: x86: Allow host-initiated WRMSR to set X2APIC regardless of CPUID
KVM: x86: Add kvm_x86_ops .cache_gprs() and .flush_gprs()
KVM: x86: Add support for vCPU and device-scoped KVM_MEMORY_ENCRYPT_OP
KVM: x86: Introduce vm_teardown() hook in kvm_arch_vm_destroy()
KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched
behavior
KVM: x86: Check for pending APICv interrupt in kvm_vcpu_has_events()
KVM: x86: Add option to force LAPIC expiration wait
KVM: x86: Add guest_supported_xss placholder
KVM: Export kvm_is_reserved_pfn() for use by TDX
KVM: x86/mmu: Explicitly check for MMIO spte in fast page fault
KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
KVM: x86/mmu: Ignore bits 63 and 62 when checking for "present" SPTEs
KVM: x86/mmu: Allow non-zero init value for shadow PTE
KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce
indentation
KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits()
KVM: x86/mmu: Frame in support for private/inaccessible shadow pages
KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX
KVM: VMX: Modify NMI and INTR handlers to take intr_info as param
KVM: VMX: Move NMI/exception handler to common helper
KVM: VMX: Split out guts of EPT violation to common/exposed function
KVM: VMX: Define EPT Violation architectural bits
KVM: VMX: Define VMCS encodings for shared EPT pointer
KVM: VMX: Add 'main.c' to wrap VMX and TDX
KVM: VMX: Move setting of EPT MMU masks to common VT-x code
KVM: VMX: Move register caching logic to common code
KVM: TDX: Add TDX "architectural" error codes
KVM: TDX: Add architectural definitions for structures and values
KVM: TDX: Define TDCALL exit reason
KVM: TDX: Add macro framework to wrap TDX SEAMCALLs
KVM: TDX: Stub in tdx.h with structs, accessors, and VMCS helpers
KVM: VMX: Add macro framework to read/write VMCS for VMs and TDs
KVM: VMX: Move AR_BYTES encoder/decoder helpers to common.h
KVM: VMX: MOVE GDT and IDT accessors to common code
KVM: VMX: Move .get_interrupt_shadow() implementation to common VMX
code
KVM: TDX: Load and init TDX-SEAM module during boot
KVM: TDX: Add "basic" support for building and running Trust Domains
KVM: x86: Mark the VM (TD) as bugged if non-coherent DMA is detected
Zhang Chen (1):
x86/cpu: Move get_builtin_firmware() common code (from microcode only)
arch/arm64/include/asm/kvm_host.h | 3 -
arch/arm64/kvm/arm.c | 7 +-
arch/arm64/kvm/vgic/vgic-init.c | 6 +-
arch/x86/Kbuild | 1 +
arch/x86/include/asm/cpu.h | 5 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_boot.h | 43 +
arch/x86/include/asm/kvm_host.h | 52 +-
arch/x86/include/asm/microcode.h | 3 -
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/vmx.h | 6 +
arch/x86/include/asm/vmxfeatures.h | 2 +-
arch/x86/include/uapi/asm/kvm.h | 55 +
arch/x86/include/uapi/asm/vmx.h | 4 +-
arch/x86/kernel/cpu/common.c | 20 +
arch/x86/kernel/cpu/intel.c | 4 +
arch/x86/kernel/cpu/microcode/core.c | 18 -
arch/x86/kernel/cpu/microcode/intel.c | 1 +
arch/x86/kernel/setup.c | 3 +
arch/x86/kvm/Kconfig | 8 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/boot/Makefile | 5 +
arch/x86/kvm/boot/seam/seamldr.S | 188 +++
arch/x86/kvm/boot/seam/seamloader.c | 162 +++
arch/x86/kvm/boot/seam/tdx.c | 1131 +++++++++++++++
arch/x86/kvm/ioapic.c | 4 +
arch/x86/kvm/irq_comm.c | 6 +-
arch/x86/kvm/lapic.c | 9 +-
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/mmu.h | 33 +-
arch/x86/kvm/mmu/mmu.c | 519 +++++--
arch/x86/kvm/mmu/mmu_internal.h | 5 +
arch/x86/kvm/mmu/paging_tmpl.h | 27 +-
arch/x86/kvm/mmu/spte.c | 36 +-
arch/x86/kvm/mmu/spte.h | 30 +-
arch/x86/kvm/svm/svm.c | 22 +-
arch/x86/kvm/trace.h | 57 +
arch/x86/kvm/vmx/common.h | 180 +++
arch/x86/kvm/vmx/main.c | 1130 +++++++++++++++
arch/x86/kvm/vmx/posted_intr.c | 6 +
arch/x86/kvm/vmx/tdx.c | 1847 +++++++++++++++++++++++++
arch/x86/kvm/vmx/tdx.h | 245 ++++
arch/x86/kvm/vmx/tdx_arch.h | 230 +++
arch/x86/kvm/vmx/tdx_errno.h | 91 ++
arch/x86/kvm/vmx/tdx_ops.h | 544 ++++++++
arch/x86/kvm/vmx/tdx_stubs.c | 45 +
arch/x86/kvm/vmx/vmenter.S | 140 ++
arch/x86/kvm/vmx/vmx.c | 537 ++-----
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 296 +++-
include/linux/kvm_host.h | 51 +-
include/uapi/linux/kvm.h | 2 +
kernel/cpu.c | 4 +
lib/firmware/intel-seam/libtdx.so | 0
tools/arch/x86/include/uapi/asm/kvm.h | 55 +
tools/include/uapi/linux/kvm.h | 2 +
virt/kvm/kvm_main.c | 45 +-
57 files changed, 7230 insertions(+), 712 deletions(-)
create mode 100644 arch/x86/include/asm/kvm_boot.h
create mode 100644 arch/x86/kvm/boot/Makefile
create mode 100644 arch/x86/kvm/boot/seam/seamldr.S
create mode 100644 arch/x86/kvm/boot/seam/seamloader.c
create mode 100644 arch/x86/kvm/boot/seam/tdx.c
create mode 100644 arch/x86/kvm/vmx/common.h
create mode 100644 arch/x86/kvm/vmx/main.c
create mode 100644 arch/x86/kvm/vmx/tdx.c
create mode 100644 arch/x86/kvm/vmx/tdx.h
create mode 100644 arch/x86/kvm/vmx/tdx_arch.h
create mode 100644 arch/x86/kvm/vmx/tdx_errno.h
create mode 100644 arch/x86/kvm/vmx/tdx_ops.h
create mode 100644 arch/x86/kvm/vmx/tdx_stubs.c
create mode 100644 lib/firmware/intel-seam/libtdx.so
--
2.17.1