Re: [PATCH v2 0/5] Add support for EPT execute only for nested hypervisors

From: Bandan Das
Date: Wed Jul 13 2016 - 11:07:50 EST


Paolo Bonzini <pbonzini@xxxxxxxxxx> writes:

> On 13/07/2016 00:18, Bandan Das wrote:
>> v1 of this series posted at https://lkml.org/lkml/2016/6/28/7
>>
>> Changes since v1:
>> - 1/5 : modify is_shadow_present_pte to check against 0xffffffff
>> Reasoning provided in commit message.
>> - 2/5 : Removed 2/5 from v1 since kvm doesn't use execute only.
>> 3/5 from v1 is now 2/5. Introduce shadow_present_mask that
>> signifies whether ept execute only is supported. Add/remove some
>> comments as suggested in v1.
>> - 3/5 : 4/5 from v1 is now 3/5.
>> - 4/5 : update_permission_bitmask now sets u=1 only if host doesn't
>> support ept execute only.
>> - 5/5 : No change
>
> These are the diffs I have after review, do they look okay?
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 190c0559c221..bd2535fdb9eb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2524,11 +2524,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> return 0;
>
> /*
> - * In the non-EPT case, execonly is not valid and so
> - * the following line is equivalent to spte |= PT_PRESENT_MASK.
> * For the EPT case, shadow_present_mask is 0 if hardware
> - * supports it and we honor whatever way the guest set it.
> - * See: FNAME(gpte_access) in paging_tmpl.h
> + * supports exec-only page table entries. In that case,
> + * ACC_USER_MASK and shadow_user_mask are used to represent
> + * read access. See FNAME(gpte_access) in paging_tmpl.h.
> */

I would still prefer a note about the non-EPT case, makes it easy to
understand.

> spte |= shadow_present_mask;
> if (!speculative)
> @@ -3923,9 +3922,6 @@ static void update_permission_bitmask(struct kvm_vcpu *vcpu,
> * clearer.
> */
> smap = cr4_smap && u && !uf && !ff;
> - } else {
> - if (shadow_present_mask)
> - u = 1;
> }
>
> fault = (ff && !x) || (uf && !u) || (wf && !w) ||
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 576c47cda1a3..dfef081e76c0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6120,12 +6120,14 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> trace_kvm_page_fault(gpa, exit_qualification);
>
> - /* It is a write fault? */
> + /* it is a read fault? */
> + error_code = (exit_qualification << 2) & PFERR_USER_MASK;
> + /* it is a write fault? */
> error_code = exit_qualification & PFERR_WRITE_MASK;
> /* It is a fetch fault? */
> error_code |= (exit_qualification << 2) & PFERR_FETCH_MASK;
> /* ept page table is present? */
> - error_code |= (exit_qualification >> 3) & PFERR_PRESENT_MASK;
> + error_code |= (exit_qualification & 0x38) != 0;
>

Thank you for the thorough review here. I missed that we didn't set the read bit
at all. I am still a little unclear how permission_fault works though...


> vcpu->arch.exit_qualification = exit_qualification;
>
> @@ -6474,8 +6476,7 @@ static __init int hardware_setup(void)
> (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
> 0ull, VMX_EPT_EXECUTABLE_MASK,
> cpu_has_vmx_ept_execute_only() ?
> - 0ull : PT_PRESENT_MASK);
> - BUILD_BUG_ON(PT_PRESENT_MASK != VMX_EPT_READABLE_MASK);
> + 0ull : VMX_EPT_READABLE_MASK);

I wanted to keep it the former way because "PT_PRESENT_MASK is equal to VMX_EPT_READABLE_MASK"
is an assumption all throughout. I wanted to use this section to catch mismatches.

Bandan

> ept_set_mmio_spte_mask();
> kvm_enable_tdp();
> } else