Re: [PATCH] KVM: TDX: Fix APIC MSR ranges in tdx_has_emulated_msr()
From: Edgecombe, Rick P
Date: Fri Apr 03 2026 - 14:41:03 EST
On Fri, 2026-04-03 at 09:30 -0700, Sean Christopherson wrote:
> On Thu, Mar 19, 2026, Rick P Edgecombe wrote:
> > On Thu, 2026-03-19 at 15:40 +0800, Binbin Wu wrote:
> > > tdx_has_emulated_msr() is used by KVM to decide whether to emulate a MSR access from the
> > > TDVMCALL or just return the error code.
> > >
> > > During an off-list discussion, Rick noted that #VE reduction could change the behavior of
> > > accessing an MSR (e.g., from #VE to #GP or to be virtualized by the TDX module) without
> > > KVM knowing.Because KVM lacks the full context to perfectly decide if an MSR should be
> > > emulated, the question was raised: Can we just delete tdx_has_emulated_msr() entirely?
> > >
> > > However, these native type x2apic MSRs are a special case. Since the TDX module owns the
> > > APICv page, KVM cannot emulate these MSRs. If we remove tdx_has_emulated_msr(), a guest
> > > directly issuing TDVMCALLs for these native type x2apic MSRs will trigger a silent failure,
> > > even though this is the guest's fault.
> > >
> > > It comes down to a tradeoff. Should we prioritize code simplicity by dropping the function,
> > > or keep it to explicitly catch this misbehaving guest corner case?
> >
> > I think from KVM's perspective it doesn't want to help the guest behave
> > correctly.
>
> Uh, yes KVM does does. KVM is responsible for emulating the APIC timer, isn't it?
Yea totally. We need to emulate the interface accurately. But we are kind of
making up the contract after the fact. If the guest performs the wrong type of
MSR write, should we make the contract that the VMM should help it catch it's
mistake?
>
> > So we can ignore that I think. But it does really care to not define
> > any specific guest ABI that it has to maintain. So tdx_has_emulated_msr() has
> > some value there. And even more, it wants to not allow the guest to hurt the
> > host.
> >
> > On the latter point, another problem with deleting tdx_has_emulated_msr() is the
> > current code path skips the checks done in the other MSR paths. So we would need
> > to call some appropriate higher up MSR helper to protect the host? And that
> > wades into the CPUID bit consistency issues.
> >
> > So maybe... could we do a more limited version of the deletion where we allow
> > all the APIC MSRs through? We'd have to check that it won't cause problems.
>
> What? No. KVM can't get actually read/write most (all?) MSRs, allowing access
> is far worse than returning an error, as for all intents and purposes KVM will
> silently drop writes, and return garbage on reads.
>
> > Failing that, we should maybe just explicitly list the ones TDX supports rather
> > than the current way we define the APIC ones. As you mention below, it's not
> > correct in other ways too so it could be more robust.
>
> No? Don't we just want to allow access to MSRs that aren't accelerated? What
> the TDX-Module supports is largely irrelevant, I think.
Not sure if I might be missing the point here. As above, we don't have enough
info to know which MSRs are accelerated. If the guest enabled #VE reduction, it
changes which ones are accelerated and the VMM is not notified. I think the
below is a sane limitation, but doesn't lets KVM perfectly notify the guest when
it screws up.
So the line would be to block MSRs that can never be emulated.
BTW, I've been treating this secret contract change as an arch mistake to at
least not build on. It's a whole subject though... Let me know if you are
interested in the details.
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 1e47c194af53..28e87630870b 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2116,23 +2116,26 @@ bool tdx_has_emulated_msr(u32 index)
> case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
> /* MSR_IA32_MCx_{CTL, STATUS, ADDR, MISC, CTL2} */
> case MSR_KVM_POLL_CONTROL:
> + /*
> + * x2APIC registers that are virtualized by the CPU can't be
> + * emulated, KVM doesn't have access to the virtual APIC page.
> + */
> + case X2APIC_MSR(APIC_ID):
> + case X2APIC_MSR(APIC_LVR):
> + case X2APIC_MSR(APIC_LDR):
> + case X2APIC_MSR(APIC_SPIV):
> + case X2APIC_MSR(APIC_ESR):
> + case X2APIC_MSR(APIC_ICR):
> + case X2APIC_MSR(APIC_LVTT):
> + case X2APIC_MSR(APIC_LVTTHMR):
> + case X2APIC_MSR(APIC_LVTPC):
> + case X2APIC_MSR(APIC_LVT0):
> + case X2APIC_MSR(APIC_LVT1):
> + case X2APIC_MSR(APIC_LVTERR):
> + case X2APIC_MSR(APIC_TMICT):
> + case X2APIC_MSR(APIC_TMCCT):
> + case X2APIC_MSR(APIC_TDCR):
> return true;
> - case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
> - /*
> - * x2APIC registers that are virtualized by the CPU can't be
> - * emulated, KVM doesn't have access to the virtual APIC page.
> - */
> - switch (index) {
> - case X2APIC_MSR(APIC_TASKPRI):
> - case X2APIC_MSR(APIC_PROCPRI):
> - case X2APIC_MSR(APIC_EOI):
> - case X2APIC_MSR(APIC_ISR) ... X2APIC_MSR(APIC_ISR + APIC_ISR_NR):
> - case X2APIC_MSR(APIC_TMR) ... X2APIC_MSR(APIC_TMR + APIC_ISR_NR):
> - case X2APIC_MSR(APIC_IRR) ... X2APIC_MSR(APIC_IRR + APIC_ISR_NR):
> - return false;
> - default:
> - return true;
> - }
> default:
> return false;
> }