Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm

From: Paolo Bonzini
Date: Tue Oct 25 2022 - 17:34:25 EST


On 10/25/22 17:55, Sean Christopherson wrote:
On Tue, Oct 25, 2022, Paolo Bonzini wrote:
That said, I believe the limited memslot API makes it more than just a QEMU
problem. Because KVM_GET_DIRTY_LOG cannot be combined atomically with
KVM_SET_USER_MEMORY_REGION(MR_DELETE), any VMM that uses dirty-log regions
while the VM is running is liable to losing the dirty status of some pages.

... and doesn't already do the sane thing and pause vCPUs _and anything else that
can touch guest memory_ before modifying memslots. I honestly think QEMU is the > only VMM that would ever use this API. Providing a way to force vCPUs
out of KVM_RUN> is at best half of the solution.

I agree this is not a full solution (and I do want to remove KVM_RESUME_ALL_KICKED_VCPUS).

- a refcounting scheme to track the number of "holds" put on the system
- serialization to ensure KVM_RESUME_ALL_KICKED_VCPUS completes before a new
KVM_KICK_ALL_RUNNING_VCPUS is initiated

Both of these can be just a mutex, the others are potentially more interesting but I'm not sure I understand them:

- to prevent _all_ ioctls() because it's not just KVM_RUN that consumes memslots

This is perhaps an occasion to solve another disagreement: I still think that accessing memory outside KVM_RUN (for example KVM_SET_NESTED_STATE loading the APICv pages from VMCS12) is a bug, on the other hand we disagreed on that and you wanted to kill KVM_REQ_GET_NESTED_STATE_PAGES.

- to stop anything else in the system that consumes KVM memslots, e.g. KVM GT

Is this true if you only look at the KVM_GET_DIRTY_LOG case and consider it a guest bug to access the memory (i.e. ignore the strange read-only changes which only happen at boot, and which I agree are QEMU-specific)?

- to signal vCPU tasks so that the system doesn't livelock if a vCPU is stuck
outside of KVM, e.g. in get_user_pages_unlocked() (Peter Xu's series)

This is the more important one but why would it livelock?

And because of the nature of KVM, to support this API on all architectures, KVM
needs to make change on all architectures, whereas userspace should be able to
implement a generic solution.

Yes, I agree that this is essentially just a more efficient kill(). Emanuele, perhaps you can put together a patch to x86/vmexit.c in kvm-unit-tests, where CPU0 keeps changing memslots and the other CPUs are in a for(;;) busy wait, to measure the various ways to do it?

Paolo