Re: [PATCH 2/2] KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
From: Tom Lendacky
Date: Tue Mar 10 2026 - 17:42:19 EST
On 3/10/26 13:35, Sean Christopherson wrote:
> On Tue, Mar 10, 2026, Tom Lendacky wrote:
>> On 3/10/26 12:48, Naveen N Rao wrote:
>>> On Tue, Mar 10, 2026 at 12:36:09PM -0500, Tom Lendacky wrote:
>>>> On 3/10/26 12:17, Sean Christopherson wrote:
>>>>> On Tue, Mar 10, 2026, Srikanth Aithal wrote:
>>>>>>
>>>>>> Hello Sean,
>>>>>>
>>>>>> From next-20260304 onwards [1], including recent next kernel next-20260309,
>>>>>> booting an SEV-ES guest on AMD EPYC Turin and AMD EPYC Genoa has been
>>>>>> failing. However, on EPYC Milan, the SEV-ES guest boots fine.
>>>>>
>>>>> ...
>>>>>
>>>>>> Bisecting shows that this commit is the first bad one. When I revert it, I
>>>>>> am able to boot the SEV-ES guest successfully on both Turin and Genoa
>>>>>> platforms:
>>>>>>
>>>>>> e992bf67bcbab07a7f59963b2c4ed32ef65c8431 is the first bad commit
>>>>>> commit e992bf67bcbab07a7f59963b2c4ed32ef65c8431
>>>>>> Author: Sean Christopherson <seanjc@xxxxxxxxxx>
>>>>>> Date: Tue Feb 3 11:07:10 2026 -0800
>>>>>
>>>>> Gah, I hate how KVM manages intercepts for SEV-ES+. Though to a large extent I
>>>>> blame the architecture for not simply making CR{0,4,8} intercept trap-like.
>>>>> Side topic, is the host actually allowed to trap CR3 writes? That seems like a
>>>>> huge gaping security flaw, especially for SNP+.
>>>>>
>>>>> Anyways, this should fix the immediate problem.
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>>>>> index 33172f0e986b..b6072872b785 100644
>>>>> --- a/arch/x86/kvm/svm/avic.c
>>>>> +++ b/arch/x86/kvm/svm/avic.c
>>>>> @@ -237,7 +237,8 @@ static void avic_deactivate_vmcb(struct vcpu_svm *svm)
>>>>> vmcb->control.int_ctl &= ~(AVIC_ENABLE_MASK | X2APIC_MODE_MASK);
>>>>> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>>>>>
>>>>> - svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>> + if (!sev_es_guest(svm->vcpu.kvm))
>>>>> + svm_set_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>>
>>>>> /*
>>>>> * If running nested and the guest uses its own MSR bitmap, there
>>>>>
>>>>> Argh! The more I look at this code, the more frustrated I get. The unconditional
>>>>> setting of TRAP_CR8_WRITE for SEV-ES+ is flawed. When AVIC is enabled, KVM doesn't
>>>>
>>>> AVIC is disabled for SEV guests (see __sev_guest_init() and the
>>>> kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_SEV) call at the end of
>>>> the function).
>>>
>>> AVIC gets inhibited globally, but continues to be enabled on
>>> vcpu_create() opportunistically -- see kvm_create_lapic(). It only gets
>>> disabled later during vcpu setup via
>>> vcpu_reset()->svm_vcpu_reset()->init_vmcb()->avic_init_vmcb()
>>
>> I'm just saying that the unconditional trap for CR8_WRITE isn't flawed
>> for SEV-ES+ because AVIC can't work with SEV, so there isn't any time
>> that CR8 writes shouldn't be trapped.
>
> Yeah, I forgot that (obviously).
>
> But sync_cr8_to_lapic() is very broken, no? INTERCEPT_CR8_WRITE will never be
> set, and svm->vmcb->control.int_ctl will become stale as soon as the VMSA is
> live, and so in all likelihood KVM is crushing CR8 to zero for SEV-ES guests.
I don't think so. V_TPR is written on #VMEXIT even for SEV-ES+ guests,
and since it is a trap, CR8 is set and so V_TPR should have that value.
That would imply sync_cr8_to_lapic() should do the right thing.
After attempting to verify this behavior it turns out that writes to CR8
(and CR2) are, in fact, not trapped, but the APM was not updated with
this information (I'll send a patch to remove that code). KVM's CR8
value is, however, synced with the proper value through
sync_cr8_to_lapic() because V_TPR in the VMCB is updated on #VMEXIT.
Thanks,
Tom