Re: [PATCH v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2

From: Wanpeng Li
Date: Tue Aug 01 2017 - 18:42:34 EST


2017-08-02 3:59 GMT+08:00 Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>:
> 2017-07-31 19:25-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>> CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>> RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>> Call Trace:
>> vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>> ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>> kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>> ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>> ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>> kvm_vcpu_ioctl+0x340/0x700 [kvm]
>> ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>> ? __fget+0xfc/0x210
>> do_vfs_ioctl+0xa4/0x6a0
>> ? __fget+0x11d/0x210
>> SyS_ioctl+0x79/0x90
>> do_syscall_64+0x8f/0x750
>> ? trace_hardirqs_on_thunk+0x1a/0x1c
>> entry_SYSCALL64_slow_path+0x25/0x25
>>
>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>> means that tells the kernel to not make use of any IOAPICs that may be present
>> in the system.
>>
>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>> interrupt on exit" for the irq window requirement in this scenario.
>>
>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>> if there is no L1 requirement to inject an interrupt to L2.
>>
>> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
>> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
>> ---
>> v1 -> v2:
>> * update patch description
>> * check nested_exit_intr_ack_set() first
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>
>> vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>
>> - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>> - && nested_exit_intr_ack_set(vcpu)) {
>> + if (nested_exit_intr_ack_set(vcpu) &&
>> + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>> + kvm_cpu_has_interrupt(vcpu)) {
>
> This would work as a solution, but I don't think it's the correct
> behavior.
>
> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
> interrupt information (valid interrupt) is always set to 1 on
> EXIT_REASON_EXTERNAL_INTERRUPT. We don't want to break hypervisors
> expecting an interrupt in that case, so we should do a userspace VM exit
> when the window is open and then inject the userspace interrupt with a
> VM exit.

Agreed.

>
> The simplest thing that came to my mind is to:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..9ad0c882c4f5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
> return 0;
> }
>
> - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> + if ((kvm_cpu_has_interrupt(vcpu) ||
> + (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
> nested_exit_on_intr(vcpu)) {
> if (vmx->nested.nested_run_pending)
> return -EBUSY;
>

Agreed.

> but I think it could break more ... actually, why was the window closed?
>
> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
> in order to decide need for the window, but vmx_check_nested_events()
> doesn't care about that at all, so the window might just appear closed.
> Would the following hunk help too?

In addition, the request window can be requested by L0's userspace
(kvm_arch_pre_run), and the idea below still can't fix in my testing.

Regards,
Wanpeng Li

>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..7e6caa9c225d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>
> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
> {
> - return (!to_vmx(vcpu)->nested.nested_run_pending &&
> - vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
> + if (is_guest_mode(vcpu))
> + return !to_vmx(vcpu)->nested.nested_run_pending;
> +
> + return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
> !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
> (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
> }
>
> (It doesn't prevent malicious userspace from hitting the WARN, though.)
>
> Thanks.