Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

From: Alexey Kardashevskiy
Date: Thu May 25 2023 - 23:17:04 EST

On 24/5/23 01:44, Sean Christopherson wrote:
On Tue, May 23, 2023, Alexey Kardashevskiy wrote:

On 23/5/23 09:39, Sean Christopherson wrote:
On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
Prior to SEV-ES, KVM saved/restored host debug registers upon switching
to/from a VM. Changing those registers inside a running SEV VM
triggered #VMEXIT to KVM.

Please, please don't make it sound like some behavior is *the* one and only behavior.
*KVM* *chooses* to intercept debug register accesses. Though I would omit this
paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
a basic understanding of how KVM deals with DRs and #DBs.

Out of curiosity - is ARM similar in this regard, interceptions and stuff?

Yes. The granularity of interceptions varies based on the architectural revision,
and presumably there are things that always trap. But the core concept of letting
software decide what to intercept is the same.

SEV-ES added encrypted state (ES) which uses an encrypted page
for the VM state (VMSA). The hardware saves/restores certain registers
on VMRUN/VMEXIT according to a swap type (A, B, C), see
"Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
volume 2.

The DR6 and DR7 registers have always been swapped as Type A for SEV-ES

Please rewrite this to just state what the behavior is instead of "Type A" vs.
"Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
enough to justify obfuscating super simple concepts.

Actually, this feature really has nothing to do with Type A vs. Type B, since
that's purely about host state.

Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
then the host looses debug state because of the working feature.

I assume the switch from Type A to Type B is a
side effect, or an opportunistic optimization?

There is no switch. DR[67] were and are one type, and the other DRs were not
swapped and now are, but with a different Swap Type.

Ah, this is what I missed.

And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
explaining why is that.

Regardless, something like this is a lot easier for a human to read. It's easy
enough to find DebugSwap in the APM (literally took me longer to find my copy of
the PDF).

It is also easy to find "Swap Types" in the APM...

Redirecting readers to specs for gory details is fine. Redirecting for basic,
simple functionality is not. Maybe the swap types will someday be common knowledge
in KVM, and an explanation will no longer be necessary, but we are nowhere near
that point.

Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load _guest_ debug registers,

But it does save/load DR6 and DR7.

and KVM cannot manually context switch guest DRs due the VMSA being

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB.

(english question) What does "vectoring" mean here precisely? Handling?

It's not really an English thing. "Vectoring" is, or at least was, Intel terminology
for describing the flow from the initial detection of an exception to the exception's
delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
information on the stack, fetching the software exception handler, etc. Importantly,
it describes the period where there are no instructions retired and thus ucode doesn't
open event windows, i.e. where triggering another, non-contributory exception will
effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).

the host by putting the CPU into an infinite loop of vectoring #DBs

Good one, thanks for the link.

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

guests, but a new feature is available, identified via
CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
support for swapping additional debug registers. DR[0-3] and
DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
is set.

Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
("NoNestedDataBp", "Processor ignores nested data breakpoints") is
supported by the SOC as otherwise a malicious SEV-ES guest can set up
data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 and #DB intercepts as:
- they are not needed when DebugSwap is supported;

"not needed" isn't sufficient justification. KVM doesn't strictly need to do a
lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
when NoNestedDataBp is support. Presumably the answer is "because it's simpler
than toggling #DB interception for guest_debug.

TBH I did not have a good answer but from the above I'd say we want to
disable #DB intercepts in a separate patch, just as you say below.

Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
separate patch? KVM can still inject #DBs for SEV-ES guests, no?

Sorry for my ignorance but what is the point of injecting #DB if there is no
way of changing the guest's DR7?

Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
"What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
with DebugSwap, there is no point, which is why I agree that KVM should disable
interception in that case. What I'm calling out is that disabling #Db interception
isn't _necessary_ for correctness (unless I'm missing something), which means that
it can and should go in a separate patch.

About this. Instead of sev_es_init_vmcb(), I can toggle the #DB intercept when toggling guest_debug, see below. This kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if guest_state_protected = true). Is there any downside?

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index da28ed69bf4a..a7df5eb4ac00 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1951,9 +1951,15 @@ static void svm_update_exception_bitmap(struct kvm_vcpu *vcpu)

clr_exception_intercept(svm, BP_VECTOR);

+ if (cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
+ clr_exception_intercept(svm, DB_VECTOR);
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
set_exception_intercept(svm, BP_VECTOR);
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ set_exception_intercept(svm, DB_VECTOR);