[PATCH v6 0/4] KVM: async_pf: Fix async pf exception injection

From: Wanpeng Li
Date: Wed Jun 28 2017 - 08:25:47 EST


INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
Not tainted 4.12.0-rc4+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
gnome-terminal- D 0 1734 1015 0x00000000
Call Trace:
__schedule+0x3cd/0xb30
schedule+0x40/0x90
kvm_async_pf_task_wait+0x1cc/0x270
? __vfs_read+0x37/0x150
? prepare_to_swait+0x22/0x70
do_async_page_fault+0x77/0xb0
? do_async_page_fault+0x77/0xb0
async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patchset fixes it according to Radim's proposal "force a nested VM exit
from nested_vmx_check_exception if the injected #PF is async_pf and handle
the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html

Note: The patchset almost not touch SVM since I don't have AMD CPU to verify
the modification.

v5 -> v6:
* move vcpu_svm's apf_reason to vcpu->arch.apf.host_apf_reason
* introduce function kvm_handle_page_fault() to be used by both VMX/SVM
* introduce svm's codes posted by Paolo
* introduce nested_apf
* better set MSR_KVM_ASYNC_PF_EN

v4 -> v5:
* utilize wrmsr_safe for MSR_KVM_ASYNC_PF_EN

v3 -> v4:
* reuse pad field in kvm_vcpu_events for async_page_fault
* update kvm_vcpu_events API documentations
* change async_page_fault type in vcpu->arch.exception from bool to u8

v2 -> v3:
* add the flag to the userspace interface(KVM_GET/PUT_VCPU_EVENTS)

v1 -> v2:
* remove nested_vmx_check_exception nr parameter
* construct a simple special vm-exit information field for async pf
* introduce nested_apf_token to vcpu->arch.apf to avoid change the CR2
visible in L2 guest
* avoid pass the apf directed towards it (L1) into L2 if there is L3
at the moment

Wanpeng Li (4):
KVM: x86: Simple kvm_x86_ops->queue_exception parameter
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit

Documentation/virtual/kvm/api.txt | 8 +++--
Documentation/virtual/kvm/msr.txt | 5 +--
arch/x86/include/asm/kvm_emulate.h | 1 +
arch/x86/include/asm/kvm_host.h | 8 +++--
arch/x86/include/uapi/asm/kvm.h | 3 +-
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/kvm.c | 7 ++++-
arch/x86/kvm/mmu.c | 35 ++++++++++++++++++++-
arch/x86/kvm/mmu.h | 2 ++
arch/x86/kvm/svm.c | 58 ++++++++++++-----------------------
arch/x86/kvm/vmx.c | 39 ++++++++++++++---------
arch/x86/kvm/x86.c | 29 ++++++++++++------
tools/arch/x86/include/uapi/asm/kvm.h | 3 +-
13 files changed, 125 insertions(+), 74 deletions(-)

--
2.7.4