Re: [PATCH v2 3/3] kvm: svm: Use the hardware provided GPA instead of page walk

From: Brijesh Singh
Date: Mon Dec 12 2016 - 12:51:17 EST


Hi Paolo,


On 12/09/2016 09:41 AM, Paolo Bonzini wrote:

I am able to reproduce it on AMD HW using kvm-unit-tests. Looking at
test, the initial thought is "push mem" has two operands (the memory
being pushed and the stack pointer). The provided GPA could be either
one of those.

Aha, this makes sense---and it's easy to handle too, since you can just
add a flag to the decode table and extend the string op case to cover
that flag too. Detecting cross-page MMIO is more problematic, I don't
have an idea offhand of how to solve it.


I added a new flag "TwoMemOp" and this seems to be passing the "push mem" and "pop mem" tests. If you are okay with this then I can convert it into a patch for review.

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0ea543e..c86dc1d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -171,6 +171,7 @@
#define NearBranch ((u64)1 << 52) /* Near branches */
#define No16 ((u64)1 << 53) /* No 16 bit operand */
#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
+#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */

#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)

@@ -4104,7 +4105,7 @@ static const struct opcode group1[] = {
};

static const struct opcode group1A[] = {
- I(DstMem | SrcNone | Mov | Stack | IncSP, em_pop), N, N, N, N, N, N, N,
+ I(DstMem | SrcNone | Mov | Stack | IncSP | TwoMemOp, em_pop), N, N, N, N, N, N, N,
};

static const struct opcode group2[] = {
@@ -4142,7 +4143,7 @@


I(SrcMemFAddr | ImplicitOps, em_jmp_far),
- I(SrcMem | Stack, em_push), D(Undefined),
+ I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined),


static const struct opcode group6[] = {
@@ -4331,8 +4332,8 @@


I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op),

- I2bvIP(SrcSI | DstDX | String, em_out, outs, check_perm_out), /* outsb, outsw/outsd */
+ b, insw/insd */
+ */

/* 0x80 - 0x87 */
@@ -4360,13 +4361,13 @@ static const struct opcode opcode_table[256] = {
/* 0xA0 - 0xA7 */
I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
- I2bv(SrcSI | DstDI | Mov | String, em_mov),
- F2bv(SrcSI | DstDI | String | NoWrite, em_cmp_r),
+ I2bv(SrcSI | DstDI | Mov | String | TwoMemOp, em_mov),
+ F2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
/* 0xA8 - 0xAF */
F2bv(DstAcc | SrcImm | NoWrite, em_test),
- I2bv(SrcAcc | DstDI | Mov | String, em_mov),
- I2bv(SrcSI | DstAcc | Mov | String, em_mov),
- F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
+ I2bv(SrcAcc | DstDI | Mov | String | TwoMemOp, em_mov),
+ I2bv(SrcSI | DstAcc | Mov | String | TwoMemOp, em_mov),
+ F2bv(SrcAcc | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
/* 0xB0 - 0xB7 */
X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
/* 0xB8 - 0xBF */
@@ -5484,10 +5485,7 @@ void emulator_writeback_register_cache(struct x86_emulate_ctxt *ctxt)
writeback_registers(ctxt);
}

-bool emulator_is_string_op(struct x86_emulate_ctxt *ctxt)
+bool emulator_is_two_memory_op(struct x86_emulate_ctxt *ctxt)
{
- if (ctxt->d & String)
- return true;
-
- return false;
+ return ctxt->d & TwoMemOp ? true : false;
}



If we can detect those cases, we should not set the gpa_available on
them (like what we do with string move).

It would forbid usage of "push/pop mem" instructions with MMIO for SEV,
right? It probably doesn't happen much in practice, but it's unfortunate.


As per the AMD BKDG [1] Section 2.7.1, we should not be using any of these instruction for MMIO access, the behavior is undefined.

The question is, do we really need to add logic to detect the cross-page MMIO accesses and push/pop mem operations so that we pass the kvm-unit-test or we should update the unit test? Like you said cross-page MMIO access detection is going to be a bit tricky.

Thoughts ?

[1] http://support.amd.com/TechDocs/52740_16h_Models_30h-3Fh_BKDG.pdf


Paolo

We probably haven't hit this case in guest booting. Will investigate bit
further and provide a updated patch to handle it.

-Brijesh
The VMX patch to set gpa_available is just this:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 25d48380c312..5d7b60d4795b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6393,6 +6393,7 @@ static int handle_ept_violation(struct kvm_vcpu
*vcpu)
/* ept page table is present? */
error_code |= (exit_qualification & 0x38) != 0;

+ vcpu->arch.gpa_available = true;
vcpu->arch.exit_qualification = exit_qualification;

return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
@@ -6410,6 +6411,7 @@ static int handle_ept_misconfig(struct kvm_vcpu
*vcpu)
}

ret = handle_mmio_page_fault(vcpu, gpa, true);
+ vcpu->arch.gpa_available = true;
if (likely(ret == RET_MMIO_PF_EMULATE))
return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
EMULATE_DONE;
@@ -8524,6 +8526,7 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
u32 vectoring_info = vmx->idt_vectoring_info;

trace_kvm_exit(exit_reason, vcpu, KVM_ISA_VMX);
+ vcpu->arch.gpa_available = false;

/*
* Flush logged GPAs PML buffer, this will make dirty_bitmap more

Thanks,

Paolo

---
arch/x86/include/asm/kvm_emulate.h | 3 +++
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm.c | 2 ++
arch/x86/kvm/x86.c | 17 ++++++++++++++++-
4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h
b/arch/x86/include/asm/kvm_emulate.h
index e9cd7be..2d1ac09 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -344,6 +344,9 @@ struct x86_emulate_ctxt {
struct read_cache mem_read;
};

+/* String operation identifier (matches the definition in emulate.c) */
+#define CTXT_STRING_OP (1 << 13)
+
/* Repeat String Operation Prefix */
#define REPE_PREFIX 0xf3
#define REPNE_PREFIX 0xf2
diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h
index 77cb3f9..fd5b1c8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -668,6 +668,9 @@ struct kvm_vcpu_arch {

int pending_ioapic_eoi;
int pending_external_vector;
+
+ /* GPA available (AMD only) */
+ bool gpa_available;
};

struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5e64e656..1bbd04c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4246,6 +4246,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
return 1;
}

+ vcpu->arch.gpa_available = (exit_code == SVM_EXIT_NPF);
+
return svm_exit_handlers[exit_code](svm);
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c30f62dc..5002eea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4441,7 +4441,19 @@ static int vcpu_mmio_gva_to_gpa(struct
kvm_vcpu *vcpu, unsigned long gva,
return 1;
}

- *gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
exception);
+ /*
+ * If the exit was due to a NPF we may already have a GPA.
+ * If the GPA is present, use it to avoid the GVA to GPA table
+ * walk. Note, this cannot be used on string operations since
+ * string operation using rep will only have the initial GPA
+ * from when the NPF occurred.
+ */
+ if (vcpu->arch.gpa_available &&
+ !(vcpu->arch.emulate_ctxt.d & CTXT_STRING_OP))
+ *gpa = exception->address;
+ else
+ *gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access,
+ exception);

if (*gpa == UNMAPPED_GVA)
return -1;
@@ -5563,6 +5575,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
}

restart:
+ /* Save the faulting GPA (cr2) in the address field */
+ ctxt->exception.address = cr2;
+
r = x86_emulate_insn(ctxt);

if (r == EMULATION_INTERCEPTED)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html