If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based type 2 APF event delivery.
Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
Documentation/virt/kvm/msr.rst | 16 +++++++++++++++-
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/x86.c | 9 ++++++++-
3 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 7433e55f7184..18db3448db06 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -219,6 +219,11 @@ data:
If during pagefault APF reason is 0 it means that this is regular
page fault.
+ For interrupt based delivery, guest has to write '1' to
+ MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared
+ 'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its
+ queue and deliver next pending notification.
+
During delivery of type 1 APF cr2 contains a token that will
be used to notify a guest when missing page becomes
available. When page becomes available type 2 APF is sent with
@@ -340,4 +345,13 @@ data:
To switch to interrupt based delivery of type 2 APF events guests
are supposed to enable asynchronous page faults and set bit 3 in
- MSR_KVM_ASYNC_PF_EN first.
+
+MSR_KVM_ASYNC_PF_ACK:
+ 0x4b564d07
+
+data:
+ Asynchronous page fault acknowledgment. When the guest is done
+ processing type 2 APF event and 'reason' field in 'struct
+ kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to
+ Bit 0 of the MSR, this caused the host to re-scan its queue and
+ check if there are more notifications pending.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 1bbb0b7e062f..5c7449980619 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,7 @@
#define MSR_KVM_PV_EOI_EN 0x4b564d04
#define MSR_KVM_POLL_CONTROL 0x4b564d05
#define MSR_KVM_ASYNC_PF2 0x4b564d06
+#define MSR_KVM_ASYNC_PF_ACK 0x4b564d07
struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 861dce1e7cf5..e3b91ac33bfd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
HV_X64_MSR_TSC_EMULATION_STATUS,
MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
- MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
+ MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2, MSR_KVM_ASYNC_PF_ACK,
MSR_IA32_TSC_ADJUST,
MSR_IA32_TSCDEADLINE,
@@ -2915,6 +2915,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (kvm_pv_enable_async_pf2(vcpu, data))
return 1;
break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ if (data & 0x1)
+ kvm_check_async_pf_completion(vcpu);
+ break;
case MSR_KVM_STEAL_TIME:
if (unlikely(!sched_info_on()))
@@ -3194,6 +3198,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_KVM_ASYNC_PF2:
msr_info->data = vcpu->arch.apf.msr2_val;
break;
+ case MSR_KVM_ASYNC_PF_ACK:
+ msr_info->data = 0;
+ break;
case MSR_KVM_STEAL_TIME:
msr_info->data = vcpu->arch.st.msr_val;
break;