Re: [PATCH 1/2] KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume

From: Paolo Bonzini
Date: Fri Sep 15 2017 - 07:26:37 EST


On 15/09/2017 05:48, Wanpeng Li wrote:
> 2017-09-14 5:45 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
>> On 13/09/2017 13:03, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
>>> CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G W OE 4.13.0+ #17
>>> RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
>>> Call Trace:
>>> ? emulator_read_emulated+0x15/0x20 [kvm]
>>> ? segmented_read+0xae/0xf0 [kvm]
>>> vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
>>> ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
>>> x86_emulate_instruction+0x733/0x810 [kvm]
>>> vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
>>> ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
>>> kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
>>> ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>> kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>> ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>> ? __fget+0xfc/0x210
>>> do_vfs_ioctl+0xa4/0x6a0
>>> ? __fget+0x11d/0x210
>>> SyS_ioctl+0x79/0x90
>>> entry_SYSCALL_64_fastpath+0x23/0xc2
>>>
>>> A nested #PF is triggered during L0 emulating instruction for L2. However, it
>>> doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
>>> it by queuing the #PF exception instead ,requesting an immediate VM exit from
>>> L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
>>>
>>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>> ---
>>> arch/x86/kvm/vmx.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 4253ade..fda9dd6 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -9829,7 +9829,8 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu,
>>>
>>> WARN_ON(!is_guest_mode(vcpu));
>>>
>>> - if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code)) {
>>> + if (nested_vmx_is_page_fault_vmexit(vmcs12, fault->error_code) &&
>>> + !to_vmx(vcpu)->nested.nested_run_pending) {
>>> vmcs12->vm_exit_intr_error_code = fault->error_code;
>>> nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>>> PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
>>>
>>
>> Is vmx_inject_page_fault_nested even needed at all these days?
>>
>> kvm_inject_page_fault's call to kvm_queue_exception_e should transform
>> into an L2->L1 vmexit when vmx_check_nested_events is called.
>
> After more investigation, this will break the original goal of what
> vmx_inject_page_fault_nested() tries to fix.
> http://www.spinics.net/lists/kvm/msg96579.html

Right! I think I have a generic patch for the same issue that Gleb
solved there. We can fill in the IDT vectoring info early in the
vmexit, so that the L1 vmexit can overwrite the L2 exception easily.

Thanks,

Paolo