Re: [PATCH 0/4] KVM: API to block and resume all running vcpus in a vm

From: Christian Borntraeger
Date: Mon Oct 24 2022 - 05:10:15 EST




Am 24.10.22 um 10:33 schrieb Emanuele Giuseppe Esposito:


Am 24/10/2022 um 09:56 schrieb Christian Borntraeger:
Am 22.10.22 um 17:48 schrieb Emanuele Giuseppe Esposito:
This new API allows the userspace to stop all running
vcpus using KVM_KICK_ALL_RUNNING_VCPUS ioctl, and resume them with
KVM_RESUME_ALL_KICKED_VCPUS.
A "running" vcpu is a vcpu that is executing the KVM_RUN ioctl.

This serie is especially helpful to userspace hypervisors like
QEMU when they need to perform operations on memslots without the
risk of having a vcpu reading them in the meanwhile.
With "memslots operations" we mean grow, shrink, merge and split
memslots, which are not "atomic" because there is a time window
between the DELETE memslot operation and the CREATE one.
Currently, each memslot operation is performed with one or more
ioctls.
For example, merging two memslots into one would imply:
DELETE(m1)
DELETE(m2)
CREATE(m1+m2)

And a vcpu could attempt to read m2 right after it is deleted, but
before the new one is created.

Therefore the simplest solution is to pause all vcpus in the kvm
side, so that:
- userspace just needs to call the new API before making memslots
changes, keeping modifications to the minimum
- dirty page updates are also performed when vcpus are blocked, so
there is no time window between the dirty page ioctl and memslots
modifications, since vcpus are all stopped.
- no need to modify the existing memslots API
Isnt QEMU able to achieve the same goal today by forcing all vCPUs
into userspace with a signal? Can you provide some rationale why this
is better in the cover letter or patch description?

David Hildenbrand tried to propose something similar here:
https://github.com/davidhildenbrand/qemu/commit/86b1bf546a8d00908e33f7362b0b61e2be8dbb7a

While it is not optimized, I think it's more complex that the current
serie, since qemu should also make sure all running ioctls finish and
prevent the new ones from getting executed.

Also we can't use pause_all_vcpus()/resume_all_vcpus() because they drop
the BQL.

Would that be ok as rationale?

Yes that helps and should be part of the cover letter for the next iterations.