Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
From: Aithal, Srikanth
Date: Tue Mar 10 2026 - 12:14:05 EST
Hello Sean,
From next-20260304 onwards [1], including recent next kernel next-20260309, booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been failing. However, on EPYC Milan, the SEV-ES guest boots fine.
I am using the same QEMU command line (given below) with the same versions of QEMU and OVMF on all three platforms.
"$QEMU_BIN" \
-machine q35,confidential-guest-support=sev0,vmport=off \
-object sev-guest,id=sev0,policy=0x5,cbitpos=51,reduced-phys-bits=1 \
-name guest=vm,debug-threads=on \
-drive if=pflash,format=raw,unit=0,file="$OVMF_PATH",readonly=on \
-m 2048 \
-object memory-backend-ram,size=2048M,id=mem-machine_mem \
-smp 1,maxcpus=1,cores=1,threads=1,dies=1,sockets=1 \
-cpu host \
-drive id=disk0,file="$DISK_IMAGE",format=qcow2,if=none \
-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
-device scsi-hd,drive=disk0 \
-enable-kvm \
-nographic \
-monitor tcp:localhost:4444,server,nowait
QEMU version: v10.2.1
OVMF version: edk2-stable202602
The SEV-ES guest crashes with the following QEMU trace:
error: kvm run failed Invalid argument
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00a10f10
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=30 17 4d 99 a6 74 ad 5a a1 1d d2 22 78 9f 73 25 ab 00 2f c0 <cd> d3 ee 26 63 0d f5 de f3 ea c3 91 28 ba b5 ac ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
KVM host serial log message that appears when the crash happens:
text
[ 4379.695497] kvm_amd: kvm [5809]: vcpu0, guest rIP: 0x0 vmgexit: unsupported event - exit_info_1=0x18, exit_info_2=0x0
Bisecting shows that this commit is the first bad one. When I revert it, I am able to boot the SEV-ES guest successfully on both Turin and Genoa platforms:
e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
Author: Sean Christopherson <seanjc@xxxxxxxxxx>
Date: Tue Feb 3 11:07:10 2026 -0800
KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git, next-20260304
Will be happy to get any more information required. Thank you.
Srikanth Aithal <sraithal@xxxxxxx>
On 2/4/2026 12:37 AM, Sean Christopherson wrote:
Explicitly set/clear CR8 write interception when AVIC is (de)activated to
fix a bug where KVM leaves the interception enabled after AVIC is
activated. E.g. if KVM emulates INIT=>WFS while AVIC is deactivated, CR8
will remain intercepted in perpetuity.
On its own, the dangling CR8 intercept is "just" a performance issue, but
combined with the TPR sync bug fixed by commit d02e48830e3f ("KVM: SVM:
Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active"), the danging
intercept is fatal to Windows guests as the TPR seen by hardware gets
wildly out of sync with reality.
Note, VMX isn't affected by the bug as TPR_THRESHOLD is explicitly ignored
when Virtual Interrupt Delivery is enabled, i.e. when APICv is active in
KVM's world. I.e. there's no need to trigger update_cr8_intercept(), this
is firmly an SVM implementation flaw/detail.
WARN if KVM gets a CR8 write #VMEXIT while AVIC is active, as KVM should
never enter the guest with AVIC enabled and CR8 writes intercepted.
Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
Cc: stable@xxxxxxxxxxxxxxx
Cc: Jim Mattson <jmattson@xxxxxxxxxx>
Cc: Naveen N Rao (AMD) <naveen@xxxxxxxxxx>
Cc: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx>
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/svm/avic.c | 6 ++++--
arch/x86/kvm/svm/svm.c | 9 +++++----
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 44e07c27b190..13a4a8949aba 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -189,12 +189,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
struct kvm_vcpu *vcpu = &svm->vcpu;
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
-
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
vmcb->control.avic_physical_id |= avic_get_max_physical_id(vcpu);
-
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+ svm_clr_intercept(svm, INTERCEPT_CR8_WRITE);
+
/*
* Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
* accesses, while interrupt injection to a running vCPU can be
@@ -226,6 +226,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
+ svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+
/*
* If running nested and the guest uses its own MSR bitmap, there
* is no need to update L0's msr bitmap
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e8313fdc5465..aa3ab22215f5 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1077,8 +1077,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
svm_set_intercept(svm, INTERCEPT_CR0_WRITE);
svm_set_intercept(svm, INTERCEPT_CR3_WRITE);
svm_set_intercept(svm, INTERCEPT_CR4_WRITE);
- if (!kvm_vcpu_apicv_active(vcpu))
- svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
+ svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
set_dr_intercepts(svm);
@@ -2674,9 +2673,11 @@ static int dr_interception(struct kvm_vcpu *vcpu)
static int cr8_write_interception(struct kvm_vcpu *vcpu)
{
- int r;
-
u8 cr8_prev = kvm_get_cr8(vcpu);
+ int r;
+
+ WARN_ON_ONCE(kvm_vcpu_apicv_active(vcpu));
+
/* instruction emulation calls kvm_set_cr8() */
r = cr_interception(vcpu);
if (lapic_in_kernel(vcpu))