[PATCH v15 0/6] KVM: s390: pv: implement lazy destroy for reboot

From: Claudio Imbrenda
Date: Mon Oct 10 2022 - 10:55:11 EST


Previously, when a protected VM was rebooted or when it was shut down,
its memory was made unprotected, and then the protected VM itself was
destroyed. Looping over the whole address space can take some time,
considering the overhead of the various Ultravisor Calls (UVCs). This
means that a reboot or a shutdown would take a potentially long amount
of time, depending on the amount of used memory.

This patchseries implements a deferred destroy mechanism for protected
guests. When a protected guest is destroyed, its memory can be cleared
in background, allowing the guest to restart or terminate significantly
faster than before.

There are 2 possibilities when a protected VM is torn down:
* it still has an address space associated (reboot case)
* it does not have an address space anymore (shutdown case)

For the reboot case, two new commands are available for the
KVM_S390_PV_COMMAND:

KVM_PV_ASYNC_CLEANUP_PREPARE: prepares the current protected VM for
asynchronous teardown. The current VM will then continue immediately
as non-protected. If a protected VM had already been set aside without
starting the teardown process, this call will fail. In this case the
userspace process should issue a normal KVM_PV_DISABLE

KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
set aside for asychronous teardown. This PV command should ideally be
issued by userspace from a separate thread. If a fatal signal is
received (or the process terminates naturally), the command will
terminate immediately without completing. The rest of the normal KVM
teardown process will take care of properly cleaning up all leftovers.

The idea is that userspace should first issue the
KVM_PV_ASYNC_CLEANUP_PREPARE command, and in case of success, create a
new thread and issue KVM_PV_ASYNC_CLEANUP_PERFORM from there. This also
allows for proper accounting of the CPU time needed for the
asynchronous teardown.

This means that the same address space can have memory belonging to
more than one protected guest, although only one will be running, the
others will in fact not even have any CPUs.

The shutdown case should be dealt with in userspace (e.g. using
clone(CLONE_VM)).

A module parameter is also provided to disable the new functionality,
which is otherwise enabled by default. This should not be an issue
since the new functionality is opt-in anyway. This is mainly thought to
aid debugging.

v14->v15
* fix some variable names
* improve comment in kvm_s390_pv_deinit_vm
* use existing macros instead of magic values for UVC_RC_EXECUTED
* add lockdep_assert_held to kvm_s390_pv_set_aside

v13->v14
* improve wording of commit messages
* improve wording of documentation
* improve wording of comments
* add if (!async_destroy) check in ioctl handler
* use UVC_RC_EXECUTED macro instead of hardcoded value
* use kzalloc instead of kmalloc with __GFP_ZERO flag
* rebase

v12->v13
* drop the patches that have been already merged
* rebase

Claudio Imbrenda (6):
KVM: s390: pv: asynchronous destroy for reboot
KVM: s390: pv: api documentation for asynchronous destroy
KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE
KVM: s390: pv: avoid export before import if possible
KVM: s390: pv: support for Destroy fast UVC
KVM: s390: pv: module parameter to fence asynchronous destroy

Documentation/virt/kvm/api.rst | 37 +++-
arch/s390/include/asm/kvm_host.h | 2 +
arch/s390/include/asm/uv.h | 10 +
arch/s390/kernel/uv.c | 7 +
arch/s390/kvm/kvm-s390.c | 58 +++++-
arch/s390/kvm/kvm-s390.h | 3 +
arch/s390/kvm/pv.c | 336 ++++++++++++++++++++++++++++++-
include/uapi/linux/kvm.h | 3 +
8 files changed, 434 insertions(+), 22 deletions(-)

--
2.37.3