Re: [PATCH 4/4] KVM: use signals to abort enter_guest/blocking and retry

From: Emanuele Giuseppe Esposito
Date: Mon Oct 24 2022 - 03:49:53 EST




Am 24/10/2022 um 09:43 schrieb Emanuele Giuseppe Esposito:
>
>
> Am 23/10/2022 um 19:48 schrieb Paolo Bonzini:
>> On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote:
>>> Once a vcpu exectues KVM_RUN, it could enter two states:
>>> enter guest mode, or block/halt.
>>> Use a signal to allow a vcpu to exit the guest state or unblock,
>>> so that it can exit KVM_RUN and release the read semaphore,
>>> allowing a pending KVM_KICK_ALL_RUNNING_VCPUS to continue.
>>>
>>> Note that the signal is not deleted and used to propagate the
>>> exit reason till vcpu_run(). It will be clearead only by
>>> KVM_RESUME_ALL_KICKED_VCPUS. This allows the vcpu to keep try
>>> entering KVM_RUN and perform again all checks done in
>>> kvm_arch_vcpu_ioctl_run() before entering the guest state,
>>> where it will return again if the request is still set.
>>>
>>> However, the userspace hypervisor should also try to avoid
>>> continuously calling KVM_RUN after invoking KVM_KICK_ALL_RUNNING_VCPUS,
>>> because such call will just translate in a back-to-back down_read()
>>> and up_read() (thanks to the signal).
>>
>> Since the userspace should anyway avoid going into this effectively-busy
>> wait, what about clearing the request after the first exit?  The
>> cancellation ioctl can be kept for vCPUs that are never entered after
>> KVM_KICK_ALL_RUNNING_VCPUS.  Alternatively, kvm_clear_all_cpus_request
>> could be done right before up_write().
>
> Clearing makes sense, but should we "trust" the userspace not to go into
> busy wait?

Actually since this change is just a s/test/check, I would rather keep
test. If userspace does things wrong, this mechanism would still work
properly.

> What's the typical "contract" between KVM and the userspace? Meaning,
> should we cover the basic usage mistakes like forgetting to busy wait on
> KVM_RUN?
>
> If we don't, I can add a comment when clearing and of course also
> mention it in the API documentation (that I forgot to update, sorry :D)
>
> Emanuele
>
>>
>> Paolo
>>
>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@xxxxxxxxxx>
>>> ---
>>>   arch/x86/include/asm/kvm_host.h |  2 ++
>>>   arch/x86/kvm/x86.c              |  8 ++++++++
>>>   virt/kvm/kvm_main.c             | 21 +++++++++++++++++++++
>>>   3 files changed, 31 insertions(+)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h
>>> b/arch/x86/include/asm/kvm_host.h
>>> index aa381ab69a19..d5c37f344d65 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -108,6 +108,8 @@
>>>       KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>   #define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
>>>       KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> +#define KVM_REQ_USERSPACE_KICK        \
>>> +    KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT)
>>>     #define
>>> CR0_RESERVED_BITS                                               \
>>>       (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM |
>>> X86_CR0_TS \
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index b0c47b41c264..2af5f427b4e9 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10270,6 +10270,10 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>> *vcpu)
>>>       }
>>>         if (kvm_request_pending(vcpu)) {
>>> +        if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu)) {
>>> +            r = -EINTR;
>>> +            goto out;
>>> +        }
>>>           if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
>>>               r = -EIO;
>>>               goto out;
>>> @@ -10701,6 +10705,10 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
>>>               r = vcpu_block(vcpu);
>>>           }
>>>   +        /* vcpu exited guest/unblocked because of this request */
>>> +        if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu))
>>> +            return -EINTR;
>>> +
>>>           if (r <= 0)
>>>               break;
>>>   diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index ae0240928a4a..13fa7229b85d 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3431,6 +3431,8 @@ static int kvm_vcpu_check_block(struct kvm_vcpu
>>> *vcpu)
>>>           goto out;
>>>       if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu))
>>>           goto out;
>>> +    if (kvm_test_request(KVM_REQ_USERSPACE_KICK, vcpu))
>>> +        goto out;
>>>         ret = 0;
>>>   out:
>>> @@ -4668,6 +4670,25 @@ static long kvm_vm_ioctl(struct file *filp,
>>>           r = kvm_vm_ioctl_enable_cap_generic(kvm, &cap);
>>>           break;
>>>       }
>>> +    case KVM_KICK_ALL_RUNNING_VCPUS: {
>>> +        /*
>>> +         * Notify all running vcpus that they have to stop.
>>> +         * Caught in kvm_arch_vcpu_ioctl_run()
>>> +         */
>>> +        kvm_make_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK);
>>> +
>>> +        /*
>>> +         * Use wr semaphore to wait for all vcpus to exit from KVM_RUN.
>>> +         */
>>> +        down_write(&memory_transaction);
>>> +        up_write(&memory_transaction);
>>> +        break;
>>> +    }
>>> +    case KVM_RESUME_ALL_KICKED_VCPUS: {
>>> +        /* Remove all requests sent with KVM_KICK_ALL_RUNNING_VCPUS */
>>> +        kvm_clear_all_cpus_request(kvm, KVM_REQ_USERSPACE_KICK);
>>> +        break;
>>> +    }
>>>       case KVM_SET_USER_MEMORY_REGION: {
>>>           struct kvm_userspace_memory_region kvm_userspace_mem;
>>>  
>>