Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register
From: Maxim Levitsky
Date: Thu Jul 28 2022 - 06:57:13 EST
On Thu, 2022-07-28 at 15:55 +0700, Suravee Suthikulpanit wrote:
> Maxim,
>
> On 7/28/22 2:38 PM, Maxim Levitsky wrote:
> > On Sun, 2022-07-24 at 22:34 -0500, Suravee Suthikulpanit wrote:
> > > AMD does not support APIC TSC-deadline timer mode. AVIC hardware
> > > will generate GP fault when guest kernel writes 1 to bits [18]
> > > of the APIC LVTT register (offset 0x32) to set the timer mode.
> > > (Note: bit 18 is reserved on AMD system).
> > >
> > > Therefore, always intercept and let KVM emulate the MSR accesses.
> > >
> > > Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
> > > Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
> > > ---
> > > arch/x86/kvm/svm/svm.c | 9 ++++++++-
> > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index aef63aae922d..3e0639a68385 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -118,7 +118,14 @@ static const struct svm_direct_access_msrs {
> > > { .index = X2APIC_MSR(APIC_ESR), .always = false },
> > > { .index = X2APIC_MSR(APIC_ICR), .always = false },
> > > { .index = X2APIC_MSR(APIC_ICR2), .always = false },
> > > - { .index = X2APIC_MSR(APIC_LVTT), .always = false },
> > > +
> > > + /*
> > > + * Note:
> > > + * AMD does not virtualize APIC TSC-deadline timer mode, but it is
> > > + * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
> > > + * the AVIC hardware would generate GP fault. Therefore, always
> > > + * intercept the MSR 0x832, and do not setup direct_access_msr.
> > > + */
> > > { .index = X2APIC_MSR(APIC_LVTTHMR), .always = false },
> > > { .index = X2APIC_MSR(APIC_LVTPC), .always = false },
> > > { .index = X2APIC_MSR(APIC_LVT0), .always = false },
> >
> > LVT is not something I would expect x2avic to even try to emulate, I would expect
> > it to dumbly forward the write to apic backing page (garbage in, garbage out) and then
> > signal trap vmexit?
> >
> > I also think that regular AVIC works like that (just forwards the write to the page).
>
> The main difference b/w AVIC and x2AVIC is the MSR interception control, which needs to
> not-intercept x2APIC MSRs for x2AVIC (allowing HW to virtualize MSR accesses).
> However, the hypervisor can decide which x2APIC MSR to intercept and emulate.
>
> > I am asking because there is a remote possibility that due to some bug the guest got
> > direct access to x2apic registers of the host, and this is how you got that #GP.
> > Could you double check it?
>
> I have verified this behavior with the HW designer and requested them to document
> this in the next AMD programmers manual that will include x2AVIC details.
I guess this implies that when guest has direct access to LVTT msr, x2avic redirection
happens after microcode already checked some things, like reserved bits.
You are also welcome to check vs hardware team, how all other apic msrs behave - there could be similar
cases, maybe even some msrs which don't go through x2avic flow.
Assuming that this it is really the case (I am just very afraid of CVEs),
then this patch is all right.
So with all that said:
Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
Best regards,
Maxim Levitsky
>
> > We really need x2avic (and vNMI) spec to be published to know exactly how all of this
> > is supposed to work.
>
> I have raised the concern to the team responsible for publishing the doc.
>
> Best Regards,
> Suravee
>