Re: [PATCH] KVM: X86: Don't reset dr6 unconditionally when the vcpu being scheduled out

From: Sean Christopherson
Date: Mon Aug 09 2021 - 12:54:12 EST


On Mon, Aug 09, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
>
> The commit efdab992813fb ("KVM: x86: fix escape of guest dr6 to the host")
> fixed a bug. It did a great job and reset dr6 unconditionally when the
> vcpu being scheduled out since the linux kernel only activates breakpoints
> in rescheduling (perf events).
>
> But writing to debug registers is slow, and it can be shown in perf results
> sometimes even neither the host nor the guest activate breakpoints.
>
> It'd be better to reset it conditionally and this patch moves the code of
> reseting dr6 to the path of VM-exit where we can check the related
> conditions. And the comment is also updated.
>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> ---
> And even in the future, the linux kernel might activate breakpoint
> in interrupts (rather than in rescheduling only), the host would
> not be confused by the stale dr6 after this patch. The possible future
> author who would implement it wouldn't need to care about how the kvm
> mananges debug registers since it sticks to the host's expectations.

Eh, I don't think this is a valid argument. KGBD already manipulates breakpoints
in NMI context, activating breakpoints from interrupt context would absolutely
require a full audit of the kernel.

> Another solution is changing breakpoint.c and make the linux kernel
> always reset dr6 in activating breakpoints. So that dr6 is allowed
> to be stale when host breakpoints is not enabled and we don't need
> to reset dr6 in this case. But it would be no harm to reset it even in
> both places and killing stale states is good in programing.

Hmm, other than further penalizing guests that enable debug breakpoints.

What about adding a arch.switch_db_regs flag to note that DR6 is loaded with a
guest value and keeping the reset code in kvm_arch_vcpu_put()?

> arch/x86/kvm/x86.c | 29 +++++++++++++++++++++--------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4116567f3d44..39b5dad2dd19 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4310,12 +4310,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>
> static_call(kvm_x86_vcpu_put)(vcpu);
> vcpu->arch.last_host_tsc = rdtsc();
> - /*
> - * If userspace has set any breakpoints or watchpoints, dr6 is restored
> - * on every vmexit, but if not, we might have a stale dr6 from the
> - * guest. do_debug expects dr6 to be cleared after it runs, do the same.
> - */
> - set_debugreg(0, 6);
> }
>
> static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
> @@ -9375,6 +9369,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> fastpath_t exit_fastpath;
>
> bool req_immediate_exit = false;
> + bool reset_dr6 = false;
>
> /* Forbid vmenter if vcpu dirty ring is soft-full */
> if (unlikely(vcpu->kvm->dirty_ring_size &&
> @@ -9601,6 +9596,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> set_debugreg(vcpu->arch.eff_db[3], 3);
> set_debugreg(vcpu->arch.dr6, 6);

Not directly related to this patch, but why does KVM_DEBUGREG_RELOAD exist?
Commit ae561edeb421 ("KVM: x86: DR0-DR3 are not clear on reset") added it to
ensure DR0-3 are fresh when they're modified through non-standard paths, but I
don't see any reason why the new values _must_ be loaded into hardware. eff_db
needs to be updated, but I don't see why hardware DRs need to be updated unless
hardware breakpoints are active or DR exiting is disabled, and in those cases
updating hardware is handled by KVM_DEBUGREG_WONT_EXIT and KVM_DEBUGREG_BP_ENABLED.

Am I missing something?

> vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
> + reset_dr6 = true;
> } else if (unlikely(hw_breakpoint_active())) {
> set_debugreg(0, 7);
> }
> @@ -9631,17 +9627,34 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_update_dr0123(vcpu);
> kvm_update_dr7(vcpu);
> vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
> + reset_dr6 = true;
> }
>
> /*
> * If the guest has used debug registers, at least dr7
> * will be disabled while returning to the host.
> + *
> + * If we have active breakpoints in the host, restore the old state.
> + *
> * If we don't have active breakpoints in the host, we don't
> - * care about the messed up debug address registers. But if
> - * we have some of them active, restore the old state.
> + * care about the messed up debug address registers but dr6
> + * which is expected to be cleared normally. Otherwise we might
> + * leak a stale dr6 from the guest and confuse the host since
> + * neither the host reset dr6 on activating next breakpoints nor
> + * the hardware clear it on dilivering #DB. The Intel SDM says:
> + *
> + * Certain debug exceptions may clear bits 0-3. The remaining
> + * contents of the DR6 register are never cleared by the
> + * processor. To avoid confusion in identifying debug
> + * exceptions, debug handlers should clear the register before
> + * returning to the interrupted task.
> + *
> + * Keep it simple and live up to expectations: clear DR6 immediately.
> */
> if (hw_breakpoint_active())
> hw_breakpoint_restore();
> + else if (unlikely(reset_dr6))
> + set_debugreg(DR6_RESERVED, 6);
>
> vcpu->arch.last_vmentry_cpu = vcpu->cpu;
> vcpu->arch.last_guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> --
> 2.19.1.6.gb485710b
>