Re: [PATCH v6 1/7] KVM: x86: Deflect unknown MSR accesses to user space

From: Alexander Graf
Date: Wed Sep 16 2020 - 05:32:23 EST


Hi Aaron,

Thanks a lot for the amazing review! I've been caught in some other things recently, so sorry for the delayed response.

On 03.09.20 21:27, Aaron Lewis wrote:

+::
+
+ /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
+ struct {
+ __u8 error; /* user -> kernel */
+ __u8 pad[3];
+ __u32 reason; /* kernel -> user */
+ __u32 index; /* kernel -> user */
+ __u64 data; /* kernel <-> user */
+ } msr;
+
+Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
+enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
+will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
+exit for writes.
+
+The "reason" field specifies why the MSR trap occurred. User space will only
+receive MSR exit traps when a particular reason was requested during through
+ENABLE_CAP. Currently valid exit reasons are:
+
+ KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits


Can we also have ENOENT?
KVM_MSR_EXIT_REASON_ENOENT - Unknown MSR

I tried to add that at first, but it gets tricky really fast. Why should user space have a vested interest in differentiating between "MSR is not implemented" and "MSR is guarded by a CPUID flag and thus not handled" or "MSR is guarded by a CAP"?

The more details we reveal, the more likely we're to break ABI compatibility.



+
+For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
+wants to read. To respond to this request with a successful read, user space
+writes the respective data into the "data" field and must continue guest
+execution to ensure the read data is transferred into guest register state.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..4d285bf054fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1549,12 +1549,88 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
}
EXPORT_SYMBOL_GPL(kvm_set_msr);

+static int complete_emulated_msr(struct kvm_vcpu *vcpu, bool is_read)
+{
+ if (vcpu->run->msr.error) {
+ kvm_inject_gp(vcpu, 0);

Add return 1. The RIP doesn’t advance when the instruction raises a fault.

Yikes. Good catch! Thank you!



+ } else if (is_read) {
+ kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
+ kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
+ }
+
+ return kvm_skip_emulated_instruction(vcpu);
+}
+
+static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+{
+ return complete_emulated_msr(vcpu, true);
+}
+

/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -412,6 +414,15 @@ struct kvm_run {
__u64 esr_iss;
__u64 fault_ipa;
} arm_nisv;
+ /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
+ struct {
+ __u8 error; /* user -> kernel */
+ __u8 pad[3];

__u8 pad[7] to maintain 8 byte alignment? unless we can get away with
fewer bits for 'reason' and
get them from 'pad'.

Why would we need an 8 byte alignment here? I always thought natural u64 alignment on x86_64 was on 4 bytes?


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879