Re: [PATCH v3 6/6] KVM: selftests: test KVM_GUESTDBG_BLOCKIRQ

From: Sean Christopherson
Date: Mon Nov 01 2021 - 19:22:09 EST


On Mon, Nov 01, 2021, Maxim Levitsky wrote:
> On Mon, 2021-11-01 at 16:43 +0100, Vitaly Kuznetsov wrote:
> > Paolo Bonzini <pbonzini@xxxxxxxxxx> writes:
> >
> > > On 11/08/21 14:29, Maxim Levitsky wrote:
> > > > Modify debug_regs test to create a pending interrupt
> > > > and see that it is blocked when single stepping is done
> > > > with KVM_GUESTDBG_BLOCKIRQ
> > > >
> > > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
> > > > ---
> > > > .../testing/selftests/kvm/x86_64/debug_regs.c | 24 ++++++++++++++++---
> > > > 1 file changed, 21 insertions(+), 3 deletions(-)
> > >
> > > I haven't looked very much at this, but the test fails.
> > >
> >
> > Same here,
> >
> > the test passes on AMD but fails consistently on Intel:
> >
> > # ./x86_64/debug_regs
> > ==== Test Assertion Failure ====
> > x86_64/debug_regs.c:179: run->exit_reason == KVM_EXIT_DEBUG && run->debug.arch.exception == DB_VECTOR && run->debug.arch.pc == target_rip && run->debug.arch.dr6 == target_dr6
> > pid=13434 tid=13434 errno=0 - Success
> > 1 0x00000000004027c6: main at debug_regs.c:179
> > 2 0x00007f65344cf554: ?? ??:0
> > 3 0x000000000040294a: _start at ??:?
> > SINGLE_STEP[1]: exit 8 exception 1 rip 0x402a25 (should be 0x402a27) dr6 0xffff4ff0 (should be 0xffff4ff0)
> >
> > (I know I'm late to the party).
>
> Well that is strange. It passes on my intel laptop. Just tested
> (kvm/queue + qemu master, compiled today) :-(
>
> It fails on iteration 1 (and there is iteration 0) which I think means that we
> start with RIP on sti, and get #DB on start of xor instruction first (correctly),
> and then we get #DB again on start of xor instruction again?
>
> Something very strange. My laptop has i7-7600U.

I haven't verified on hardware, but my guess is that this code in vmx_vcpu_run()

/* When single-stepping over STI and MOV SS, we must clear the
* corresponding interruptibility bits in the guest state. Otherwise
* vmentry fails as it then expects bit 14 (BS) in pending debug
* exceptions being set, but that's not correct for the guest debugging
* case. */
if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
vmx_set_interrupt_shadow(vcpu, 0);

interacts badly with APICv=1. It will kill the STI shadow and cause the IRQ in
vmcs.GUEST_RVI to be recognized when it (micro-)architecturally should not. My
head is going in circles trying to sort out what would actually happen. Maybe
comment out that and/or disable APICv to see if either one makes the test pass?