Re: [PATCH 0/2] KVM: nVMX: Fix ept=n bugs where KVM runs L2 with guest CR3

From: Jim Mattson

Date: Thu Jun 04 2026 - 10:29:10 EST


On Thu, Jun 4, 2026 at 6:15 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Wed, Jun 03, 2026, Jim Mattson wrote:
> > ...
> > IIRC, the tests in question are confirming PCI bus error semantics.
> > Why would the VMLAUNCH not succeed on bare metal?
>
> Well, I was assuming it would fail because it couldn't actually guarantee PCI Bus
> Errors, as it could very well stumble into actual device memory. But the test
> leaves TPR_THRESHOLD as '0', and so regardless of what value the CPU gets back,
> consistency check will still pass.
>
> But! I'm pretty sure the test would generate #MCs, not PCI bus errors. The
> SDM very, very strongly implies that the reads will use WB:
>
> Bits 53:50 report the memory type that should be used for the VMCS, for
> data structures referenced by pointers in the VMCS (I/O bitmaps,
> virtual-APIC page, MSR areas for VMX transitions), and for the MSEG header.
> ^^^^^^^^^^^^^^^^^
>
> If software needs to access these data structures (e.g., to modify the
> contents of the MSR bitmaps), it can configure the paging structures to map
> them into the linear-address space. If it does so, it should establish
> mappings that use the memory type reported bits 53:50 in this MSR.
>
> As of this writing, all processors that support VMX operation indicate the
> write-back type. The values used are given in Table A-1.
>
> And _that_ will definitely cause problems, especially if the read hits device
> memory.

Good point. However, "problems" here are machine checks, right? Not VM
entry with invalid control field(s).

> That said, KVM's de facto ABI is that VMX instructions get PCI Bus Error semantics
> on accesses KVM can't handle, and it's just as easy to skip the consistency check.
> Since a read of 0xff guarantees the vTPR >= TPR_THRESHOLD, the check will pass
> regardless of TPR_THRESHOLD.

Hmmm...Is that an erratum? :)

> So, other than my stubbornness :-D, there's no reason to deliberately fail the
> check if KVM can't read memory. I'll go with this for v2:
>
> gpa_t vtpr_gpa = vmcs12->virtual_apic_page_addr + APIC_TASKPRI;
> u32 vtpr;
>
> if (!nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW))
> return 0;
>
> if (CC(!page_address_valid(vcpu, vmcs12->virtual_apic_page_addr)))
> return -EINVAL;
>
> if (CC(!nested_cpu_has_vid(vmcs12) && vmcs12->tpr_threshold >> 4))
> return -EINVAL;
>
> /*
> * Do the illegal vTPR vs. TPR Threshold consistency check if and only
> * if KVM is configured to WARN on missed consistency checks, otherwise
> * it's a waste of time. KVM needs to rely on hardware to fully detect
> * an illegal combination due to the vTPR being writable by L1 at all
> * times (it's an in-memory value, not a VMCS field). I.e. even if the
> * check passes now, it might fail at the actual VM-Enter.
> *
> * If reading guest memory fails, skip the check as KVM's de facto ABI
> * for VMX instruction accesses to non-existent memory is to provide
> * PCI Bus Error semantics (reads return 0xFFs), in which case the vTPR
> * is guaranteed to greater than or equal to the threshold.
> *
> * Note! Deliberately use the VM-scoped API when reading guest memory,
> * to ensure the read doesn't hit SMRAM when restoring L2 state on RSM,
> * and only perform the check when in KVM_RUN, to avoid a false failure
> * if userspace hasn't yet configured memslots during state restore.
> */

I really wish that KVM dictated the save and restore sequences. :/

> if (warn_on_missed_cc && vcpu->wants_to_run &&
> nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW) &&
> !nested_cpu_has_vid(vmcs12) &&
> !nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) &&
> !kvm_read_guest(vcpu->kvm, vtpr_gpa, &vtpr, sizeof(vtpr)) &&
> CC((vmcs12->tpr_threshold & GENMASK(3, 0)) > ((vtpr >> 4) & GENMASK(3, 0))))
> return -EINVAL;
>
> return 0;