Re: [PATCH v4 3/3] KVM: nVMX: Emulate EPTP switching for the L1 hypervisor

From: Radim KrÄmÃÅ
Date: Wed Jul 12 2017 - 09:24:56 EST


2017-07-11 17:08-0400, Bandan Das:
> Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> writes:
>
> > 2017-07-11 16:34-0400, Bandan Das:
> >> Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> writes:
> >>
> >> > 2017-07-11 15:50-0400, Bandan Das:
> >> >> Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx> writes:
> >> >> > 2017-07-11 14:24-0400, Bandan Das:
> >> >> >> Bandan Das <bsd@xxxxxxxxxx> writes:
> >> >> >> > If there's a triple fault, I think it's a good idea to inject it
> >> >> >> > back. Basically, there's no need to take care of damage control
> >> >> >> > that L1 is intentionally doing.
> >> >> >> >
> >> >> >> >>> + goto fail;
> >> >> >> >>> + kvm_mmu_unload(vcpu);
> >> >> >> >>> + vmcs12->ept_pointer = address;
> >> >> >> >>> + kvm_mmu_reload(vcpu);
> >> >> >> >>
> >> >> >> >> I was thinking about something like this:
> >> >> >> >>
> >> >> >> >> kvm_mmu_unload(vcpu);
> >> >> >> >> old = vmcs12->ept_pointer;
> >> >> >> >> vmcs12->ept_pointer = address;
> >> >> >> >> if (kvm_mmu_reload(vcpu)) {
> >> >> >> >> /* pointer invalid, restore previous state */
> >> >> >> >> kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
> >> >> >> >> vmcs12->ept_pointer = old;
> >> >> >> >> kvm_mmu_reload(vcpu);
> >> >> >> >> goto fail;
> >> >> >> >> }
> >> >> >> >>
> >> >> >> >> The you can inherit the checks from mmu_check_root().
> >> >> >>
> >> >> >> Actually, thinking about this a bit more, I agree with you. Any fault
> >> >> >> with a vmfunc operation should end with a vmfunc vmexit, so this
> >> >> >> is a good thing to have. Thank you for this idea! :)
> >> >> >
> >> >> > SDM says
> >> >> >
> >> >> > IF tent_EPTP is not a valid EPTP value (would cause VM entry to fail
> >> >> > if in EPTP) THEN VMexit;
> >> >>
> >> >> This section here:
> >> >> As noted in Section 25.5.5.2, an execution of the
> >> >> EPTP-switching VM function that causes a VM exit (as specified
> >> >> above), uses the basic exit reason 59, indicating âVMFUNCâ.
> >> >> The length of the VMFUNC instruction is saved into the
> >> >> VM-exit instruction-length field. No additional VM-exit
> >> >> information is provided.
> >> >>
> >> >> Although, it adds (as specified above), from testing, any vmexit that
> >> >> happens as a result of the execution of the vmfunc instruction always
> >> >> has exit reason 59.
> >> >>
> >> >> IMO, the case David pointed out comes under "as a result of the
> >> >> execution of the vmfunc instruction", so I would prefer exiting
> >> >> with reason 59.
> >> >
> >> > Right, the exit reason is 59 for reasons that trigger a VM exit
> >> > (i.e. invalid EPTP value, the four below), but kvm_mmu_reload() checks
> >> > unrelated stuff.
> >> >
> >> > If the EPTP value is correct, then the switch should succeed.
> >> > If the EPTP is correct, but bogus, then the guest should get
> >> > EPT_MISCONFIG VM exit on its first access (when reading the
> >> > instruction). Source: I added
> >>
> >> My point is that we are using kvm_mmu_reload() to emulate eptp
> >> switching. If that emulation of vmfunc fails, it should exit with reason
> >> 59.
>>
>> Yeah, we just disagree on what is a vmfunc failure.
>>
>>> > vmcs_write64(EPT_POINTER, vmcs_read64(EPT_POINTER) | (1ULL << 40));
>>> >
>>> > shortly before a VMLAUNCH on L0. :)
>>>
>>> What happens if this ept pointer is actually in the eptp list and the guest
>>> switches to it using vmfunc ? I think it will exit with reason 59.
>>
>> I think otherwise, because it doesn't cause a VM entry failure on
>> bare-metal (and SDM says that we get a VM exit only if there would be a
>> VM entry failure).
>> I expect the vmfunc to succeed and to get a EPT_MISCONFIG right after.
>> (Experiment pending :])
>>
>>> > I think that we might be emulating this case incorrectly and throwing
>>> > triple faults when it should be VM exits in vcpu_run().
>>>
>>> No, I agree with not throwing a triple fault. We should clear it out.
>>> But we should emulate a vmfunc vmexit back to L1 when kvm_mmu_load fails.
>>
>> Here we disagree. I think that it's a bug do the VM exit, so we can
>
> Why do you think it's a bug ?

SDM defines a different behavior and hardware doesn't do that either.
There are only two reasons for a VMFUNC VM exit from EPTP switching:

1) ECX > 0
2) EPTP would cause VM entry to fail if in VMCS.EPT_POINTER

KVM can fail for other reasons because of its bugs, but that should be
notified to the guest in another way. Rebooting the guest is kind of
acceptable in that case.

> The eptp switching function really didn't
> succeed as far as our emulation goes when kvm_mmu_reload() fails.
> And as such, the generic vmexit failure event should be a vmfunc vmexit.

I interpret it as two separate events -- at first, the vmfunc succeeds
and when it later tries to access memory through the new EPTP (valid,
but not pointing into backed memory), it results in a EPT_MISCONFIG VM
exit.

> We cannot strictly follow the spec here, the spec doesn't even mention a way
> to emulate eptp switching. If setting up the switching succeeded and the
> new root pointer is invalid or whatever, I really don't care what happens
> next but this is not the case. We fail to get a new root pointer and without
> that, we can't even make a switch!

We just make it behave exactly how the spec says that it behaves. We do
have a value (we read 'address') to put into VMCS.EPT_POINTER, which is
all we need for the emulation.
The function doesn't dereference that pointer, it just looks at its
value to decide whether it is valid or not. (btw. we should check that
properly, because we cannot depend on VM entry failure pass-through like
the normal case.)

The dereference done in kvm_mmu_reload() should happen after EPTP
switching finishes, because the spec doesn't mention a VM exit for other
reason than invalid EPT_POINTER value.

>> just keep the original bug -- we want to eventually fix it and it's no
>> worse till then.
>
> Anyway, can you please confirm again what is the behavior that you
> are expecting if kvm_mmu_reload fails ? This would be a rarely used
> branch and I am actually fine diverging from what I think is right if
> I can get the reviewers to agree on a common thing.

kvm_mmu_reload() fails when mmu_check_root() is false, which means that
the pointed physical address is not backed. We've hit this corner-case
in the past -- Jim said that the chipset returns all 1s if a read is not
claimed.

So in theory, KVM should not fail kvm_mmu_reload(), but behave as if the
pointer pointed to a memory of all 1s, which would likely result in
EPT_MISCONFIG when the guest does a memory access.

It is a mishandled corner case, but turning it into VM exit would only
confuse an OS that receives the impossible VM exit and potentially
confuse reader of the KVM logic.

I think that not using kvm_mmu_reload() directly in EPTP switching is
best. The bug is not really something we care about.