Re: [PATCH v2] KVM: nVMX: Fix losing NMI blocking state
From: Wanpeng Li
Date: Tue Jul 25 2017 - 07:36:07 EST
2017-07-25 18:55 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
> On 25/07/2017 12:40, Wanpeng Li wrote:
>> Commit 4c4a6f790ee862 (KVM: nVMX: track NMI blocking state separately for each VMCS)
>> tracks NMI blocking state separately for vmcs01 and vmcs02. However it is not enough:
>>
>> - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault on IRET, so the
>> L2 can generate #PF which can be intercepted by L0.
>> - L0 walks L1's guest page table and sees the mapping is invalid, it resumes the L1
>> guest and injects the #PF into L1.
>> - L1 awares it should set bit 3 (blocking by NMI) in the interruptibility-state field
>> and fix the shadow page table before resuming L2 guest.
>> - L1 executes VMRESUME to resume L2 which generates vmexit and causes L1 exit to L0
>> - L0 emulates VMRESUME which is called from L1, however, it lost the interruptibility
>> state field which is updated in vmcs12 when prepare vmcs02
>> - .........
>
> The "..." part is not very enlightening. My understanding is:
>
> - The L2 (kvm-unit-tests/eventinj.flat) generates NMI that will fault
> on IRET, so the L2 can generate #PF which can be intercepted by L0.
> - L0 walks L1's guest page table and sees the mapping is invalid, it
> resumes the L1 guest and injects the #PF into L1. At this point the
> vmcs02 has nmi_known_unmasked=true.
> - L1 sets set bit 3 (blocking by NMI) in the interruptibility-state field
> of vmcs12 (and fixes the shadow page table) before resuming L2 guest.
> - L1 executes VMRESUME to resume L2, causing a vmexit to L0
> - during VMRESUME emulation, prepare_vmcs02 sets bit 3 in the
> interruptibility-state field of vmcs02, but nmi_known_unmasked is
> still true.
> - on the next L2 exit to L0, nmi_known_unmasked is true so
> vmx_recover_nmi_blocking does not do anything.
Thanks for that. :)
>
> Can you explain instead what happens if your v1 patch is applied (on top of mine),
> and why it fixes the bug.
We will set the expected guest interruptibility-state field before the
final step: L0 fixes the shadow page table (NGVA -> HPA), then L0
resumes the guest w/ the expected guest interruptibility-state.
>
> The patch is correct and almost obvious, but I'd like the commit message to be precise.
>
> (Also, does your machine have shadow VMCS support?)
A Haswell desktop w/ shadow vmcs enabled.
Regards,
Wanpeng Li