Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

From: Tobin Feldman-Fitzthum
Date: Thu Mar 11 2021 - 10:31:46 EST


On 24/02/21 09:59, Nathan Tempelman wrote:

Add a capability for userspace to mirror SEV encryption context from
one vm to another. On our side, this is intended to support a
Migration Helper vCPU, but it can also be used generically to support
other in-guest workloads scheduled by the host. The intention is for
the primary guest and the mirror to have nearly identical memslots.

The primary benefits of this are that:
1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
can't accidentally clobber each other.
2) The VMs can have different memory-views, which is necessary for post-copy
migration (the migration vCPUs on the target need to read and write to
pages, when the primary guest would VMEXIT).

This does not change the threat model for AMD SEV. Any memory involved
is still owned by the primary guest and its initial state is still
attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
to circumvent SEV, they could achieve the same effect by simply attaching
a vCPU to the primary VM.
This patch deliberately leaves userspace in charge of the memslots for the
mirror, as it already has the power to mess with them in the primary guest.

This patch does not support SEV-ES (much less SNP), as it does not
handle handing off attested VMSAs to the mirror.

For additional context, we need a Migration Helper because SEV PSP migration
is far too slow for our live migration on its own. Using an in-guest
migrator lets us speed this up significantly.
Hello,

We've been thinking a lot about migrating confidential virtual machines at IBM. Maybe you've seen the approach that we (Dov Murik and myself) shared on the QEMU and OVMF mailing lists. In general, we have tried to implement migration without kernel support, which has some drawbacks. Mainly, it is difficult to dynamically start the migration handler without kernel support, which puts stress on OVMF. If there is momentum behind these KVM patches, we think they could go hand-in-hand with some of the work that we have done.

I'm not sure if you have patches for a migration handler/helper or hypervisor support. If you do, I'd be curious to see them. If not, maybe we should try to converge some of the work that has already happened. I think that no matter where the migration handler ends up running or how it is started, it will do more or less the same things: export pages to the HV and import pages from the HV. Similarly, the hypervisor is probably going to need similar mechanisms to ask the MH for encrypted pages. Given that we already have some of these things,  maybe there is a way to bring them together with this patch.

I also have a few specific questions about this patch.

I am not sure how the mirror VM will be supported in QEMU. Usually there is one QEMU process per-vm. Now we would need to run a second VM and communicate with it during migration. Is there a way to do this without adding significant complexity?

You say that SEV-ES is not supported. While there are challenges regarding setting the CPU state of the mirror, I think there may also be larger issues with using the mirror for -ES. With plain SEV, the migration handler only has to worry about guest memory. With SEV-ES the MH will probably need to set the CPU state of the guest as well. It seems difficult to do this with an MH that is in a separate VM entirely. Is there an expectation that the mirror-based approach will ever work with SEV-ES?

I am curious where you plan on putting the migration handler itself. We were drawn to OVMF because it is measured by the PSP. Do you have some alternate approach?

Do you plan to support consecutive migrations (target of first migration is source of second)? This is really just a question about the lifetime of the MH. Will the mirror VM be started and stopped dynamically or will it persist for the life of the guest on both source and target?

Finally, do you plan to use AMD PSP-based migration to migrate parts of the mirror VM or of the primary VM? The migration handler we've developed does not use PSP-based migration at all; instead it relies on secret injection to both source and target VMs to keep the migration keys secure. There are trade-offs either way.

-Tobin