Re: [PATCH 4/4] KVM: use signals to abort enter_guest/blocking and retry
From: Emanuele Giuseppe Esposito
Date: Mon Oct 24 2022 - 03:49:53 EST
Am 24/10/2022 um 09:43 schrieb Emanuele Giuseppe Esposito:
>
>
> Am 23/10/2022 um 19:48 schrieb Paolo Bonzini:
>> On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote:
>>> Once a vcpu exectues KVM_RUN, it could enter two states:
>>> enter guest mode, or block/halt.
>>> Use a signal to allow a vcpu to exit the guest state or unblock,
>>> so that it can exit KVM_RUN and release the read semaphore,
>>> allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue.
>>>
>>> Note that the signal is not deleted and used to propagate the
>>> exit reason till vcpu_run(). It will be clearead only by
>>> KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try
>>> entering KVM_RUN and perform again all checks done in
>>> kvm_arch_vcpu_ioctl_run() before entering the guest state,
>>> where it will return again if the request is still set.
>>>
>>> However, the userspace hypervisor should also try to avoid
>>> continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS,
>>> because such call will just translate in a back-to-back down_read()
>>> and up_read() (thanks to the signal).
>>
>> Since the userspace should anyway avoid going into this effectively-busy
>> wait, what about clearing the request after the first exit? The
>> cancellation ioctl can be kept for vCPUs that are never entered after
>> KVM_KICK_ALL_RUNNING_VCPUS. Alternatively, kvm_clear_all_cpus_request
>> could be done right before up_write().
>
> Clearing makes sense, but should we "trust" the userspace not to go into
> busy wait?
Actually since this change is just a s/test/check, I would rather keep
test. If userspace does things wrong, this mechanism would still work
properly.
> What's the typical "contract" between KVM and the userspace? Meaning,
> should we cover the basic usage mistakes like forgetting to busy wait on
> KVM_RUN?
>
> If we don't, I can add a comment when clearing and of course also
> mention it in the API documentation (that I forgot to update, sorry :D)
>
> Emanuele
>
>>
>> Paolo
>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@xxxxxxxxxx>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 2 ++
>>> arch/x86/kvm/x86.c | 8 ++++++++
>>> virt/kvm/kvm_main.c | 21 +++++++++++++++++++++
>>> 3 files changed, 31 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h
>>> index aa381ab69a19..d5c37f344d65 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -108,6 +108,8 @@
>>> KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
>>> KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> +#define KVM_REQ_USERSPACE_KICK \
>>> + KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT)
>>> #define
>>> CR0_RESERVED_BITS \
>>> (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM |
>>> X86_CR0_TS \
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index b0c47b41c264..2af5f427b4e9 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>> *vcpu)
>>> }
>>> if (kvm_request_pending(vcpu)) {
>>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) {
>>> + r = -EINTR;
>>> + goto out;
>>> + }
>>> if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>>> r = -EIO;
>>> goto out;
>>> @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>>> r = vcpu_block(vcpu);
>>> }
>>> + /* vcpu exited guest/unblocked because of this request */
>>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu))
>>> + return -EINTR;
>>> +
>>> if (r <= 0)
>>> break;
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index ae0240928a4a..13fa7229b85d 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu
>>> *vcpu)
>>> goto out;
>>> if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
>>> goto out;
>>> + if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu))
>>> + goto out;
>>> ret = 0;
>>> out:
>>> @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp,
>>> r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap);
>>> break;
>>> }
>>> + case KVM_KICK_ALL_RUNNING_VCPUS: {
>>> + /*
>>> + * Notify all running vcpus that they have to stop.
>>> + * Caught in kvm_arch_vcpu_ioctl_run()
>>> + */
>>> + kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK);
>>> +
>>> + /*
>>> + * Use wr semaphore to wait for all vcpus to exit from KVM_RUN.
>>> + */
>>> + down_write(&memory_transaction);
>>> + up_write(&memory_transaction);
>>> + break;
>>> + }
>>> + case KVM_RESUME_ALL_KICKED_VCPUS: {
>>> + /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */
>>> + kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK);
>>> + break;
>>> + }
>>> case KVM_SET_USER_MEMORY_REGION: {
>>> struct kvm_userspace_memory_region kvm_userspace_mem;
>>>
>>