Re: [PATCH] KVM: x86: drop erroneous mmu_check_root() from fast_pgd_switch()
From: Vitaly Kuznetsov
Date: Wed Jul 01 2020 - 03:14:31 EST
Junaid Shahid <junaids@xxxxxxxxxx> writes:
> On 6/30/20 3:07 AM, Vitaly Kuznetsov wrote:
>> Undesired triple fault gets injected to L1 guest on SVM when L2 is
>> launched with certain CR3 values. It seems the mmu_check_root()
>> check in fast_pgd_switch() is wrong: first of all we don't know
>> if 'new_pgd' is a GPA or a nested GPA and, in case it is a nested
>> GPA, we can't check it with kvm_is_visible_gfn().
>>
>> The problematic code path is:
>> nested_svm_vmrun()
>> ...
>> nested_prepare_vmcb_save()
>> kvm_set_cr3(..., nested_vmcb->save.cr3)
>> kvm_mmu_new_pgd()
>> ...
>> mmu_check_root() -> TRIPLE FAULT
>>
>> The mmu_check_root() check in fast_pgd_switch() seems to be
>> superfluous even for non-nested case: when GPA is outside of the
>> visible range cached_root_available() will fail for non-direct
>> roots (as we can't have a matching one on the list) and we don't
>> seem to care for direct ones.
>>
>> Also, raising #TF immediately when a non-existent GFN is written to CR3
>> doesn't seem to mach architecture behavior.
>>
>> Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path")
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> - The patch fixes the immediate issue and doesn't seem to break any
>> tests even with shadow PT but I'm not sure I properly understood
>> why the check was there in the first place. Please review!
>> ---
>> arch/x86/kvm/mmu/mmu.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 76817d13c86e..286c74d2ae8d 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -4277,8 +4277,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
>> */
>> if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
>> mmu->root_level >= PT64_ROOT_4LEVEL)
>> - return !mmu_check_root(vcpu, new_pgd >> PAGE_SHIFT) &&
>> - cached_root_available(vcpu, new_pgd, new_role);
>> + return cached_root_available(vcpu, new_pgd, new_role);
>>
>> return false;
>> }
>>
>
> The check does seem superfluous, so should be ok to remove. Though I
> think that fast_pgd_switch() really should be getting only L1
> GPAs. Otherwise, there could be confusion between the same GPAs from
> two different L2s.
>
> IIUC, at least on Intel, only L1 CR3s (including shadow L1 CR3s for
> L2) or L1 EPTPs should get to fast_pgd_switch(). But I am not familiar
> enough with SVM to see why an L2 GPA would end up there. From a
> cursory look, it seems that until "978ce5837c7e KVM: SVM: always
> update CR3 in VMCB", enter_svm_guest_mode() was calling kvm_set_cr3()
> only when using shadow paging, in which case I assume that
> nested_vmcb->save.cr3 would have been an L1 CR3 shadowing the L2 CR3,
> correct? But now kvm_set_cr3() is called even when not using shadow
> paging, which I suppose is how we are getting the L2 CR3. Should we
> skip calling fast_pgd_switch() in that particular case?
Thank you for your thoughts, this is helpful indeed.
As far as I can see, nVMX calls kvm_mmu_new_pgd() directly in two cases:
1) nested_ept_init_mmu_context() -> kvm_init_shadow_ept_mmu() when
switching to L2 (and 'new_pgd' is EPTP)
2) nested_vmx_load_cr3() when !nested_ept (and 'new_pgd' is 'cr3')
I think we need to do something similar for nSVM to make the root cache
work and work correctly:
1) supplement nested_svm_init_mmu_context() -> kvm_init_shadow_mmu()
with kvm_mmu_new_pgd()
2) stop doing kvm_mmu_new_pgd() from nested_prepare_vmcb_save() ->
kvm_set_cr3() when npt_enabled
Let me try to experiment here a bit.
Thanks again!
--
Vitaly