Re: [PATCH v12 00/29] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

From: Paolo Bonzini
Date: Sat Mar 30 2024 - 17:44:31 EST


On 3/29/24 23:58, Michael Roth wrote:
This patchset is also available at:

https://github.com/amdese/linux/commits/snp-host-v12

and is based on top of the following series:

[PATCH gmem 0/6] gmem fix-ups and interfaces for populating gmem pages
https://lore.kernel.org/kvm/20240329212444.395559-1-michael.roth@xxxxxxx/

which in turn is based on:

https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue


Patch Layout
------------

01-04: These patches are minor dependencies for this series and will
eventually make their way upstream through other trees. They are
included here only temporarily.

05-09: These patches add some basic infrastructure and introduces a new
KVM_X86_SNP_VM vm_type to handle differences verses the existing
KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

10-12: These implement the KVM API to handle the creation of a
cryptographic launch context, encrypt/measure the initial image
into guest memory, and finalize it before launching it.

13-20: These implement handling for various guest-generated events such
as page state changes, onlining of additional vCPUs, etc.

21-24: These implement the gmem hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges as
well as cleaning them up prior to returning them to the host for
use as normal memory. Because this supplants certain activities
like issued WBINVDs during KVM MMU invalidations, there's also
a patch to avoid duplicating that work to avoid unecessary
overhead.

25: With all the core support in place, the patch adds a kvm_amd module
parameter to enable SNP support.

26-29: These patches all deal with the servicing of guest requests to handle
things like attestation, as well as some related host-management
interfaces.


Testing
-------

For testing this via QEMU, use the following tree:

https://github.com/amdese/qemu/commits/snp-v4-wip2

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

https://github.com/amdese/ovmf/commits/apic-mmio-fix1c

A basic command-line invocation for SNP would be:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
-bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
-machine q35,confidential-guest-support=sev0,memory-backend=ram1
-object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
-object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
-bios /home/mroth/ovmf/OVMF_CODE-upstream-20240228-apicfix-1c-AmdSevX64.fd
-kernel /boot/vmlinuz-6.8.0-snp-host-v12-wip40+
-initrd /boot/initrd.img-6.8.0-snp-host-v12-wip40+
-append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"


Known issues / TODOs
--------------------

* Base tree in some cases reports "Unpatched return thunk in use. This should
not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
regression upstream and unrelated to this series:

https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@xxxxxxxxxxxxxx/T/

* 2MB hugepage support has been dropped pending discussion on how we plan
to re-enable it in gmem.

* Host kexec should work, but there is a known issue with handling host
kdump while SNP guests are running which will be addressed as a follow-up.

* SNP kselftests are currently a WIP and will be included as part of SNP
upstreaming efforts in the near-term.


SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike

Changes since v11:

* Rebase series on kvm-coco-queue and re-work to leverage more
infrastructure between SNP/TDX series.
* Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
here (Paolo):
https://lore.kernel.org/lkml/20240318233352.2728327-1-pbonzini@xxxxxxxxxx/
* Drop exposure API fields related to things like VMPL levels, migration
agents, etc., until they are actually supported/used (Sean)
* Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
kvm_gmem_populate() interface instead of copying data directly into
gmem-allocated pages (Sean)
* Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
have simpler semantics that are applicable to management of SNP_LOAD_VLEK
updates as well, rename interfaces to the now more appropriate
SNP_{PAUSE,RESUME}_ATTESTATION
* Fix up documentation wording and do print warnings for
userspace-triggerable failures (Peter, Sean)
* Fix a race with AP_CREATION wake-up events (Jacob, Sean)
* Fix a memory leak with VMSA pages (Sean)
* Tighten up handling of RMP page faults to better distinguish between real
and spurious cases (Tom)
* Various patch/documentation rewording, cleanups, etc.

I skipped a few patches that deal mostly with AMD ABIs. Here are the ones that have nontrivial remarks, that are probably be worth a reply before sending v13:

- patch 10: some extra checks on input parameters, and possibly forbidding SEV/SEV-ES ioctls for SEV-SNP guests?

- patch 12: a (hopefully) simple question on boot_vcpu_handled

- patch 18: see Sean's objections at https://lore.kernel.org/lkml/ZeCqnq7dLcJI41O9@xxxxxxxxxx/

- patch 22: question on ignoring PSMASH failures and possibly adding a kvm_arch_gmem_invalidate_begin() API.

With respect to the six preparatory patches, I'll merge them in kvm-coco-queue early next week. However I'll explode the arguments to kvm_gmem_populate(), while also removing "memslot" and merging "src" with "do_memcpy". I'll post my version very early.

Paolo