Re: [PATCH v3 18/19] KVM: MMU: mmio page fault support

From: Marcelo Tosatti
Date: Thu Jul 07 2011 - 04:51:47 EST


On Thu, Jul 07, 2011 at 03:59:12AM +0800, Xiao Guangrong wrote:
> On 07/07/2011 02:52 AM, Marcelo Tosatti wrote:
>
> >> +/*
> >> + * If it is a real mmio page fault, return 1 and emulat the instruction
> >> + * directly, return 0 to let CPU fault again on the address, -1 is
> >> + * returned if bug is detected.
> >> + */
> >> +int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> >> +{
> >> + u64 spte;
> >> +
> >> + if (quickly_check_mmio_pf(vcpu, addr, direct))
> >> + return 1;
> >> +
> >> + spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
> >> +
> >> + if (is_mmio_spte(spte)) {
> >> + gfn_t gfn = get_mmio_spte_gfn(spte);
> >> + unsigned access = get_mmio_spte_access(spte);
> >> +
> >> + if (direct)
> >> + addr = 0;
> >> + vcpu_cache_mmio_info(vcpu, addr, gfn, access);
> >> + return 1;
> >> + }
> >> +
> >> + /*
> >> + * It's ok if the gva is remapped by other cpus on shadow guest,
> >> + * it's a BUG if the gfn is not a mmio page.
> >> + */
> >> + if (direct && is_shadow_present_pte(spte))
> >> + return -1;
> >
>
> Marcelo,
>
> > This is only going to generate an spte dump, for a genuine EPT
> > misconfig, if the present bit is set.
> >
> > Should be:
> >
> > /*
> > * Page table zapped by other cpus, let CPU fault again on
> > * the address.
> > */
> > if (*spte == 0ull)
> > return 0;
>
> Can not use "*spte == 0ull" here, it should use !is_shadow_present_pte(spte)
> instead, since on x86-32 host, we can get the high 32 bit is set, and the present
> bit is cleared.

OK, then check for 0ull on x86-64, and low 32-bit zeroed but high 32-bit with mmio
pattern on x86-32.

It is not acceptable to disable spte dump for sptes without present bit.

> > /* BUG if gfn is not an mmio page */
> > return -1;
>
> We can not detect bug for soft-mmu, since the shadow page can be changed anytime,
> for example:
>
> VCPU 0 VCPU1
> mmio page is intercepted
> change the guest page table and map the virtual
> address to the RAM region
> walk shadow page table, and
> detect the gfn is RAM, spurious
> BUG is reported
>
> In theory, it can be happened.

Yes, with TDP only assigning a memory slot can change an spte from mmio
to valid, which can only happen under SRCU, so this path is protected.

> >> +static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
> >> + int *nr_present)
> >> +{
> >> + if (unlikely(is_mmio_spte(*sptep))) {
> >> + if (gfn != get_mmio_spte_gfn(*sptep)) {
> >> + mmu_spte_clear_no_track(sptep);
> >> + return true;
> >> + }
> >> +
> >> + (*nr_present)++;
> >
> > Can increase nr_present in the caller.
> >
>
> Yes, we should increase it to avoid the unsync shadow page to be freed.
>
> >> @@ -6481,6 +6506,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> >> if (!kvm->arch.n_requested_mmu_pages)
> >> nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
> >>
> >> + /*
> >> + * If the new memory slot is created, we need to clear all
> >> + * mmio sptes.
> >> + */
> >> + if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
> >> + kvm_arch_flush_shadow(kvm);
> >> +
> >
> > This should be in __kvm_set_memory_region.
> >
>
> Um, __kvm_set_memory_region is a common function, and only x86 support mmio spte,
> it seems no need to do this for all architecture?

Please do it in __kvm_set_memory_region, after
kvm_arch_prepare_memory_region (other arches won't mind the flush).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/