Re: [PATCH v3 1/3] KVM: x86: Deflect unknown MSR accesses to user space

From: Alexander Graf
Date: Mon Aug 03 2020 - 06:08:53 EST

On 01.08.20 01:36, Jim Mattson wrote:

On Fri, Jul 31, 2020 at 2:50 PM Alexander Graf <graf@xxxxxxxxxx> wrote:

MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf <graf@xxxxxxxxxx>


v1 -> v2:

- deflect all #GP injection events to user space, not just unknown MSRs.
That was we can also deflect allowlist errors later
- fix emulator case

v2 -> v3:

- return r if r == X86EMUL_IO_NEEDED
- Use complete_userspace_io logic instead of reply field
- Simplify trapping code
Documentation/virt/kvm/api.rst | 62 +++++++++++++++++++
arch/x86/include/asm/kvm_host.h | 6 ++
arch/x86/kvm/emulate.c | 18 +++++-
arch/x86/kvm/x86.c | 106 ++++++++++++++++++++++++++++++--
include/trace/events/kvm.h | 2 +-
include/uapi/linux/kvm.h | 10 +++
6 files changed, 197 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 320788f81a05..79c3e2fdfae4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst

The new exit reasons should probably be mentioned here (around line 4866):

.. note::

KVM_EXIT_EPR the corresponding

operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals. Userspace
can re-enter the guest with an unmasked signal pending to complete
pending operations.

Great catch, thanks! Updated to also include the two new exit reasons.

Other than that, my remaining comments are all nits. Feel free to ignore them.

+static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index)

Return bool rather than int?

I'm not a big fan of bool returning APIs unless they have an "is" in their name. In this case, the most readable path forward would probably be an enum:

enum kvm_msr_user_space_retval {

and then use that in the checks. But that adds a lot of boiler plate for a fully internal, only a few dozen LOC big API. I don't think it's worth it.

+ if (!vcpu->kvm->arch.user_space_msr_enabled)
+ return 0;
+ vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR;
+ vcpu->run->msr.error = 0;

Should we clear 'pad' in case anyone can think of a reason to use this
space to extend the API in the future?

It can't hurt I guess.

+ vcpu->run->msr.index = index;
+ vcpu->arch.pending_user_msr = true;
+ vcpu->arch.complete_userspace_io = complete_emulated_rdmsr;

complete_userspace_io could perhaps be renamed to
complete_userspace_emulation (in a separate commit).

I think the complicated part of complete_userspace_io is to know it exists and understand how it works. Once you grasp these two bits, the name is just an artifact and IMHO easy enough to apply "beyond I/O".

+ return 1;
+static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data)

Return bool rather than int?

Same replies as above :). I did get fed up with the amount of duplication though and created a generalized function in v4 that gets called by kvm_get/set_msr_user_space() to ensure that all fields are always set.

+ if (!vcpu->kvm->arch.user_space_msr_enabled)
+ return 0;
+ vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR;
+ vcpu->run->msr.error = 0;

Same question about 'pad' as above.

+ vcpu->run->msr.index = index;
+ vcpu->run-> = data;
+ vcpu->arch.pending_user_msr = true;
+ vcpu->arch.complete_userspace_io = complete_emulated_wrmsr;
+ return 1;

Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx>

Thanks a bunch for the review :)


Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879