On Fri, Jul 31, 2020 at 2:50 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.
Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.
This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.
Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.
Signed-off-by: Alexander Graf <graf@xxxxxxxxxx>
---
v1 -> v2:
- s/ETRAP_TO_USER_SPACE/ENOENT/g
- deflect all #GP injection events to user space, not just unknown MSRs.
That was we can also deflect allowlist errors later
- fix emulator case
v2 -> v3:
- return r if r == X86EMUL_IO_NEEDED
- s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g
- s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g
- Use complete_userspace_io logic instead of reply field
- Simplify trapping code
---
Documentation/virt/kvm/api.rst | 62 +++++++++++++++++++
arch/x86/include/asm/kvm_host.h | 6 ++
arch/x86/kvm/emulate.c | 18 +++++-
arch/x86/kvm/x86.c | 106 ++++++++++++++++++++++++++++++--
include/trace/events/kvm.h | 2 +-
include/uapi/linux/kvm.h | 10 +++
6 files changed, 197 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 320788f81a05..79c3e2fdfae4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
The new exit reasons should probably be mentioned here (around line 4866):
.. note::
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
KVM_EXIT_EPR the corresponding
operations are complete (and guest state is consistent) only after userspace
has re-entered the kernel with KVM_RUN. The kernel side will first finish
incomplete operations and then check for pending signals. Userspace
can re-enter the guest with an unmasked signal pending to complete
pending operations.
Other than that, my remaining comments are all nits. Feel free to ignore them.
+static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index)
Return bool rather than int?
+{
+ if (!vcpu->kvm->arch.user_space_msr_enabled)
+ return 0;
+
+ vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR;
+ vcpu->run->msr.error = 0;
Should we clear 'pad' in case anyone can think of a reason to use this
space to extend the API in the future?
+ vcpu->run->msr.index = index;
+ vcpu->arch.pending_user_msr = true;
+ vcpu->arch.complete_userspace_io = complete_emulated_rdmsr;
complete_userspace_io could perhaps be renamed to
complete_userspace_emulation (in a separate commit).
+
+ return 1;
+}
+
+static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data)
Return bool rather than int?
+{
+ if (!vcpu->kvm->arch.user_space_msr_enabled)
+ return 0;
+
+ vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR;
+ vcpu->run->msr.error = 0;
Same question about 'pad' as above.
+ vcpu->run->msr.index = index;
+ vcpu->run->msr.data = data;
+ vcpu->arch.pending_user_msr = true;
+ vcpu->arch.complete_userspace_io = complete_emulated_wrmsr;
+
+ return 1;
+}
+
Reviewed-by: Jim Mattson <jmattson@xxxxxxxxxx>