Re: [PATCH v3 0/6] KVM: SVM: Fix DEBUGCTL bugs
From: Maxim Levitsky
Date: Mon Apr 14 2025 - 11:10:54 EST
On Mon, 2025-04-14 at 12:02 +0530, Sandipan Das wrote:
> On 4/9/2025 4:13 AM, Sean Christopherson wrote:
> > On Tue, Apr 01, 2025, Maxim Levitsky wrote:
> > > On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote:
> > > > Fix a long-lurking bug in SVM where KVM runs the guest with the host's
> > > > DEBUGCTL if LBR virtualization is disabled. AMD CPUs rather stupidly
> > > > context switch DEBUGCTL if and only if LBR virtualization is enabled (not
> > > > just supported, but fully enabled).
> > > >
> > > > The bug has gone unnoticed because until recently, the only bits that
> > > > KVM would leave set were things like BTF, which are guest visible but
> > > > won't cause functional problems unless guest software is being especially
> > > > particular about #DBs.
> > > >
> > > > The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel),
> > > > as the resulting #DBs due to split-lock accesses in guest userspace (lol
> > > > Steam) get reflected into the guest by KVM.
> > > >
> > > > Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's
> > > > likely the behavior that SVM guests have gotten the vast, vast majority of
> > > > the time, and given that it's the behavior on Intel, it's (hopefully) a safe
> > > > option for a fix, e.g. versus trying to add proper BTF virtualization on the
> > > > fly.
> > > >
> > > > v3:
> > > > - Suppress BTF, as KVM doesn't actually support it. [Ravi]
> > > > - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed,
> > > > it's guaranteed to be '0' in this scenario). [Ravi]
> > > >
> > > > v2:
> > > > - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN.
> > > > - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest
> > > > unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi]
> > > > - Collect a review. [Xiaoyao]
> > > > - Make bits 5:3 fully reserved, in a separate not-for-stable patch.
> > > >
> > > > v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@xxxxxxxxxx
> > > >
> > >
> > > Hi,
> > >
> > > Amusingly there is another DEBUGCTL issue, which I just got to the bottom of.
> > > (if I am not mistaken of course).
> > >
> > > We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither
> > > set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host
> > > (If I read the code correctly, I didn't verify this in runtime)
> >
> > Ugh, SMM. Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest
> > value. KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough
> > to shove the bit in on writes, and drop it on reads.
> >
> > > This means that the host #SMIs will interfere with the guest PMU. In
> > > particular this causes the 'pmu' kvm-unit-test to fail, which is something
> > > that our CI caught.
> > >
> > > I think that kvm should just set this bit, or even better, use the host value
> > > of this bit, and hide it from the guest, because the guest shouldn't know
> > > about host's smm, and we AFAIK don't really support freezing perfmon when the
> > > guest enters its own emulated SMM.
> >
> > Agreed. Easy thing is to use the host's value, so that KVM doesn't need to check
> > for its existence. I can't think of anything that would go sideways by freezing
> > perfmon if the host happens to take an SMI.
> >
> > > What do you think? I'll post patches if you think that this is a good idea.
> > > (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me)
> > >
> > > I also need to check if AMD also has this feature, or if this is Intel specific.
> >
> > Intel only. I assume/think/hope AMD's Host/Guest Only field in the event selector
> > effectively hides SMM from the guest.
>
> Just using the GuestOnly bit does not hide SMM activity from guests. SMIs are
> generally intercepted (kvm_amd.intercept_smi defaults to true)
Hi,
Actually this setting doesn't really work these days, at lesat not on my Zen2 machine (3070x).
Long ago I tested it, and despite loading the system with SMIs either via APIC or via 0xB2 ioport write,
where in both cases I noticed significant slowdown of a VM, pinned on the receiving CPU I got no SMI VM exits.
BIOS likely has an option to override this setting.
I guess the reason is security, because with SVM,
one can effectively block the SMIs from being processed on the host.
Best regards,
Maxim Levitsky
> and handled in the
> host context. So guest PMCs are isolated by a combination of having the GuestOnly
> bit set and the #VMEXITs resulting from SMI interception.
>