[PATCH] KVM: X86: Don't reset dr6 unconditionally when the vcpu being scheduled out

From: Lai Jiangshan
Date: Mon Aug 09 2021 - 05:14:16 EST


From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>

The commit efdab992813fb ("KVM: x86: fix escape of guest dr6 to the host")
fixed a bug. It did a great job and reset dr6 unconditionally when the
vcpu being scheduled out since the linux kernel only activates breakpoints
in rescheduling (perf events).

But writing to debug registers is slow, and it can be shown in perf results
sometimes even neither the host nor the guest activate breakpoints.

It'd be better to reset it conditionally and this patch moves the code of
reseting dr6 to the path of VM-exit where we can check the related
conditions. And the comment is also updated.

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
---
And even in the future, the linux kernel might activate breakpoint
in interrupts (rather than in rescheduling only), the host would
not be confused by the stale dr6 after this patch. The possible future
author who would implement it wouldn't need to care about how the kvm
mananges debug registers since it sticks to the host's expectations.

Another solution is changing breakpoint.c and make the linux kernel
always reset dr6 in activating breakpoints. So that dr6 is allowed
to be stale when host breakpoints is not enabled and we don't need
to reset dr6 in this case. But it would be no harm to reset it even in
both places and killing stale states is good in programing.

arch/x86/kvm/x86.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4116567f3d44..39b5dad2dd19 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4310,12 +4310,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)

static_call(kvm_x86_vcpu_put)(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
- /*
- * If userspace has set any breakpoints or watchpoints, dr6 is restored
- * on every vmexit, but if not, we might have a stale dr6 from the
- * guest. do_debug expects dr6 to be cleared after it runs, do the same.
- */
- set_debugreg(0, 6);
}

static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
@@ -9375,6 +9369,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
fastpath_t exit_fastpath;

bool req_immediate_exit = false;
+ bool reset_dr6 = false;

/* Forbid vmenter if vcpu dirty ring is soft-full */
if (unlikely(vcpu->kvm->dirty_ring_size &&
@@ -9601,6 +9596,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(vcpu->arch.eff_db[3], 3);
set_debugreg(vcpu->arch.dr6, 6);
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
+ reset_dr6 = true;
} else if (unlikely(hw_breakpoint_active())) {
set_debugreg(0, 7);
}
@@ -9631,17 +9627,34 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_update_dr0123(vcpu);
kvm_update_dr7(vcpu);
vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
+ reset_dr6 = true;
}

/*
* If the guest has used debug registers, at least dr7
* will be disabled while returning to the host.
+ *
+ * If we have active breakpoints in the host, restore the old state.
+ *
* If we don't have active breakpoints in the host, we don't
- * care about the messed up debug address registers. But if
- * we have some of them active, restore the old state.
+ * care about the messed up debug address registers but dr6
+ * which is expected to be cleared normally. Otherwise we might
+ * leak a stale dr6 from the guest and confuse the host since
+ * neither the host reset dr6 on activating next breakpoints nor
+ * the hardware clear it on dilivering #DB. The Intel SDM says:
+ *
+ * Certain debug exceptions may clear bits 0-3. The remaining
+ * contents of the DR6 register are never cleared by the
+ * processor. To avoid confusion in identifying debug
+ * exceptions, debug handlers should clear the register before
+ * returning to the interrupted task.
+ *
+ * Keep it simple and live up to expectations: clear DR6 immediately.
*/
if (hw_breakpoint_active())
hw_breakpoint_restore();
+ else if (unlikely(reset_dr6))
+ set_debugreg(DR6_RESERVED, 6);

vcpu->arch.last_vmentry_cpu = vcpu->cpu;
vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
--
2.19.1.6.gb485710b