Re: [PATCH 3/3] KVM: x86: always stop emulation on page fault

From: Sean Christopherson
Date: Wed Aug 28 2019 - 10:23:44 EST


On Wed, Aug 28, 2019 at 10:19:51AM +0000, Jan Dakinevich wrote:
> On Tue, 27 Aug 2019 07:50:30 -0700
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote:
> > Yikes, this patch and the previous have quite the sordid history.
> >
> >
> > The non-void return from inject_emulated_exception() was added by commit
> >
> > ef54bcfeea6c ("KVM: x86: skip writeback on injection of nested exception")
> >
> > for the purpose of skipping writeback. At the time, the above blob in the
> > decode flow didn't exist.
> >
> >
> > Decode exception handling was added by commit
> >
> > 6ea6e84309ca ("KVM: x86: inject exceptions produced by x86_decode_insn")
> >
> > but it was dead code even then. The patch discussion[1] even point out that
> > it was dead code, i.e. the change probably should have been reverted.
> >
> >
> > Peng Hao and Yi Wang later ran into what appears to be the same bug you're
> > hitting[2][3], and even had patches temporarily queued[4][5], but the
> > patches never made it to mainline as they broke kvm-unit-tests. Fun side
> > note, Radim even pointed out[4] the bug fixed by patch 1/3.
> >
> > So, the patches look correct, but there's the open question of why the
> > hypercall test was failing for Paolo.
>
> Sorry, I'm little confused. Could you please, point me which test or tests
> were broken? I've just run kvm-unit-test and I see same results with and
> without my changes.
>
> > I've tried to reproduce the #DF to
> > no avail.

Aha! The #DF occurs if patch 2/3, but not patch 3/3, is applied, and the
VMware backdoor is enabled. The backdoor is off by default, which is why
only Paolo was seeing the #DF.

To handle the VMware backdoor, KVM intercepts #GP faults, which includes
the non-canonical #GP from the hypercall unit test. With only patch 2/3
applied, x86_emulate_instruction() injects a #GP for the non-canonical RIP
but returns EMULATE_FAIL instead of EMULATE_DONE. EMULATE_FAIL causes
handle_exception_nmi() (or gp_interception() for SVM) to re-inject the
original #GP because it thinks emulation failed due to a non-VMware opcode.

Applying patch 3/3 resolves the issue as x86_emulate_instruction() returns
EMULATE_DONE after injecting the #GP.


TL;DR:

Swap the order of patches and everything should be hunky dory. Please
rebase to the latest kvm/queue, which has an equivalent to patch 1/3.