Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
From: Sean Christopherson
Date: Tue Mar 10 2026 - 14:35:55 EST
On Tue, Mar 10, 2026, Tom Lendacky wrote:
> On 3/10/26 12:48, Naveen N Rao wrote:
> > On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
> >> On 3/10/26 12:17, Sean Christopherson wrote:
> >>> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
> >>>>
> >>>> Hello Sean,
> >>>>
> >>>> From next-20260304 onwards [1], including recent next kernel next-20260309,
> >>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> >>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
> >>>
> >>> ...
> >>>
> >>>> Bisecting shows that this commit is the first bad one. When I revert it, I
> >>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> >>>> platforms:
> >>>>
> >>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> >>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> >>>> Author: Sean Christopherson <seanjc@xxxxxxxxxx>
> >>>> Date: Tue Feb 3 11:07:10 2026 -0800
> >>>
> >>> Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I
> >>> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
> >>> Side topic, is the host actually allowed to trap CR3 writes? That seems like a
> >>> huge gaping security flaw, especially for SNP+.
> >>>
> >>> Anyways, this should fix the immediate problem.
> >>>
> >>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> >>> index 33172f0e986b..b6072872b785 100644
> >>> --- a/arch/x86/kvm/svm/avic.c
> >>> +++ b/arch/x86/kvm/svm/avic.c
> >>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
> >>> vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
> >>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> >>>
> >>> - svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> >>> + if (!sev_es_guest(svm->vcpu.kvm))
> >>> + svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
> >>>
> >>> /*
> >>> * If running nested and the guest uses its own MSR bitmap, there
> >>>
> >>> Argh! The more I look at this code, the more frustrated I get. The unconditional
> >>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't
> >>
> >> AVIC is disabled for SEV guests (see __sev_guest_init() and the
> >> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
> >> the function).
> >
> > AVIC gets inhibited globally, but continues to be enabled on
> > vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets
> > disabled later during vcpu setup via
> > vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()
>
> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
> that CR8 writes shouldn't be trapped.
Yeah, I forgot that (obviously).
But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be
set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.