Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated

From: Sean Christopherson

Date: Tue Mar 10 2026 - 13:23:49 EST


On Tue, Mar 10, 2026, Srikanth Aithal wrote:
>
> Hello Sean,
>
> From next-20260304 onwards [1], including recent next kernel next-20260309,
> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
> failing. However, on EPYC Milan, the SEV-ES guest boots fine.

...

> Bisecting shows that this commit is the first bad one. When I revert it, I
> am able to boot the SEV-ES guest successfully on both Turin and Genoa
> platforms:
>
> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
> Author: Sean Christopherson <seanjc@xxxxxxxxxx>
> Date: Tue Feb 3 11:07:10 2026 -0800

Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I
blame the architecture for not simply making CR{0,4,8} intercept trap-like.
Side topic, is the host actually allowed to trap CR3 writes? That seems like a
huge gaping security flaw, especially for SNP+.

Anyways, this should fix the immediate problem.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 33172f0e986b..b6072872b785 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;

- svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+ if (!sev_es_guest(svm->vcpu.kvm))
+ svm_set_intercept(svm, INTERCEPT_CR8_WRITE);

/*
* If running nested and the guest uses its own MSR bitmap, there

Argh! The more I look at this code, the more frustrated I get. The unconditional
setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't
need to trap CR8 writes because hardware will update the backing page. I'm guessing
Windows doesn't support running as an SEV-ES guest, which is no one has noticed.

Actually, it's worse than that. sync_cr8_to_lapic() will straight up clobber the
backing page. Presumably hardware never actually uses TPR from the AVIC backing
page, but it's still gross. sync_lapic_to_cr8() is also beyond useless.

And all of sync code should pivot on guest_state_protected, not sev_es_guest().

For now, I'll just post the above (assuming it fixes the issue). But this code
needs some love sooner than later.