RE: [RFC] KVM: x86: Support KVM VMs sharing SEV context

From: Kalra, Ashish
Date: Mon May 24 2021 - 17:29:08 EST


[AMD Public Use]

Hello Paolo,

I am working on prototype code in qemu to start a mirror VM running in parallel to the primary VM. Initially I had an idea of a running a completely parallel VM like using the
qemu’s microvm machine/platform, but the main issue with this idea is the difficulty in sharing the memory of primary VM with it.

Hence, I started exploring running an internal thread like the current per-vCPU thread(s) in qemu. The main issue is that qemu has a lot of global state, especially the KVMState
structure which is per-VM, and all the KVM vCPUs are very tightly tied into it. It does not make sense to add a completely new KVMState structure instance for the mirror VM
as then the mirror VM does not remain lightweight at all.

Hence, the mirror VM i am adding, has to integrate with the current KVMState structure and the “global” KVM state in qemu, this required adding some parallel KVM code in
qemu, for example to do ioctl's on the mirror VM, similar to the primary VM. Details below :

The mirror_vm_fd is added to the KVMState structure itself.

The parallel code I mentioned is like the following :

#define kvm_mirror_vm_enable_cap(s, capability, cap_flags, ...) \
({ \
struct kvm_enable_cap cap = { \
.cap = capability, \
.flags = cap_flags, \
}; \
uint64_t args_tmp[] = { __VA_ARGS__ }; \
size_t n = MIN(ARRAY_SIZE(args_tmp), ARRAY_SIZE(cap.args)); \
memcpy(cap.args, args_tmp, n * sizeof(cap.args[0])); \
kvm_mirror_vm_ioctl(s, KVM_ENABLE_CAP, &cap); \
})


+int kvm_mirror_vm_ioctl(KVMState *s, int type, ...)
+{
+ int ret;
+ void *arg;
+ va_list ap;
+
+ va_start(ap, type);
+ arg = va_arg(ap, void *);
+ va_end(ap);
+
+ trace_kvm_vm_ioctl(type, arg);
+ ret = ioctl(s->mirror_vm_fd, type, arg);
+ if (ret == -1) {
+ ret = -errno;
+ }
+ return ret;
+}
+

The vcpu ioctl code works as it is.

The kvm_arch_put_registers() also needed a mirror VM variant kvm_arch_mirror_put_registers(), for reasons such as saving MSRs on the mirror VM required enabling
the in-kernel irqchip support on the mirror VM, otherwise, kvm_put_msrs() fails. Hence, kvm_arch_mirror_put_registers() makes the mirror VM simpler by not saving
any MSRs and not needing the in-kernel irqchip support.

I had lot of issues in dynamically adding a new vCPU, i.e., the CPUState structure due to qemu's object model (QOM) which requires that every qemu
structure/object has to contain the parent/base class/object and then all the derived classes after that. It was difficult to add a new CPU object dynamically, hence I have to reuse
one of the “-smp” cpus passed on qemu command line as the mirror vCPU. This also assists in having the X86CPU "backing" structure for the mirror vCPU’s CPU object,
which allows using most of the KVM code in qemu for the mirror vCPU. Also the mirror vCPU CPU object will have the CPUX86State structure embedded which contains the
cpu register state for the mirror vCPU.

The mirror vCPU is now running a simpler KVM run loop, it does not have any in-kernel irqchip (interrupt controller) or any other kvmapic interrupt controller supported
and enabled for it. As of now it is still doing both I/O and MMIO handling.

Looking fwd. to comments, feedback, thoughts on the above approach.

Thanks,
Ashish

-----Original Message-----
From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Sent: Thursday, March 11, 2021 10:30 AM
To: Tobin Feldman-Fitzthum <tobin@xxxxxxxxxxxxx>; natet@xxxxxxxxxx
Cc: Dov Murik <dovmurik@xxxxxxxxxxxxxxxxxx>; Lendacky, Thomas <Thomas.Lendacky@xxxxxxx>; x86@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; srutherford@xxxxxxxxxx; seanjc@xxxxxxxxxx; rientjes@xxxxxxxxxx; Singh, Brijesh <brijesh.singh@xxxxxxx>; Kalra, Ashish <Ashish.Kalra@xxxxxxx>; Laszlo Ersek <lersek@xxxxxxxxxx>; James Bottomley <jejb@xxxxxxxxxxxxx>; Hubertus Franke <frankeh@xxxxxxxxxx>
Subject: Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

On 11/03/21 16:30, Tobin Feldman-Fitzthum wrote:
> I am not sure how the mirror VM will be supported in QEMU. Usually
> there is one QEMU process per-vm. Now we would need to run a second VM
> and communicate with it during migration. Is there a way to do this
> without adding significant complexity?

I can answer this part. I think this will actually be simpler than with auxiliary vCPUs. There will be a separate pair of VM+vCPU file descriptors within the same QEMU process, and some code to set up the memory map using KVM_SET_USER_MEMORY_REGION.

However, the code to run this VM will be very small as the VM does not have to do MMIO, interrupts, live migration (of itself), etc. It just starts up and communicates with QEMU using a mailbox at a predetermined address.

I also think (but I'm not 100% sure) that the auxiliary VM does not have to watch changes in the primary VM's memory map (e.g. mapping and unmapping of BARs). In QEMU terms, the auxiliary VM's memory map tracks RAMBlocks, not MemoryRegions, which makes things much simpler.

There are already many examples of mini VMMs running special purpose VMs in the kernel's tools/testing/selftests/kvm directory, and I don't think the QEMU code would be any more complex than that.

Paolo