Re: [PATCH v19 110/130] KVM: TDX: Handle TDX PV MMIO hypercall

From: Binbin Wu
Date: Thu Jul 18 2024 - 03:34:38 EST




On 7/17/2024 6:25 AM, Isaku Yamahata wrote:
On Tue, Jun 25, 2024 at 02:54:09PM +0800,
Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> wrote:


On 2/26/2024 4:26 PM, isaku.yamahata@xxxxxxxxx wrote:
From: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>

Export kvm_io_bus_read and kvm_mmio tracepoint and wire up TDX PV MMIO
hypercall to the KVM backend functions.

kvm_io_bus_read/write() searches KVM device emulated in kernel of the given
MMIO address and emulates the MMIO. As TDX PV MMIO also needs it, export
kvm_io_bus_read(). kvm_io_bus_write() is already exported. TDX PV MMIO
emulates some of MMIO itself. To add trace point consistently with x86
kvm, export kvm_mmio tracepoint.

Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
Reviewed-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/vmx/tdx.c | 114 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 1 +
virt/kvm/kvm_main.c | 2 +
3 files changed, 117 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 55fc6cc6c816..389bb95d2af0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1217,6 +1217,118 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
return ret;
}
+static int tdx_complete_mmio(struct kvm_vcpu *vcpu)
+{
+ unsigned long val = 0;
+ gpa_t gpa;
+ int size;
+
+ KVM_BUG_ON(vcpu->mmio_needed != 1, vcpu->kvm);
+ vcpu->mmio_needed = 0;
mmio_needed is used by instruction emulator to setup the complete callback.
Since TDX handle MMIO in a PV way, mmio_needed is not needed here.
Ok, we don't need to update mmio_needed.


+
+ if (!vcpu->mmio_is_write) {
It's also needed by instruction emulator, we can use
vcpu->run->mmio.is_write instead.
No because vcpu->run->mmio is shared with user space. KVM need to stash
it independently.


+ gpa = vcpu->mmio_fragments[0].gpa;
+ size = vcpu->mmio_fragments[0].len;
Since MMIO cross page boundary is not allowed according to the input checks
from TDVMCALL, these mmio_fragments[] is not needed.
Just use vcpu->run->mmio.phys_addr and vcpu->run->mmio.len?
ditto.


+
+ memcpy(&val, vcpu->run->mmio.data, size);
+ tdvmcall_set_return_val(vcpu, val);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+ }
Tracepoint for KVM_TRACE_MMIO_WRITE is missing when it is handled in
userspace.
tdx_mmio_write() has it before existing to the user space. It matches with
how write_mmio() behaves in x86.c.

Hmm, to match with other code, we should remove
trace_kvm_mmio(KVM_TRACE_MMIO_READ) and keep KVM_TRACE_MMIO_READ_UNSATISFIED
in tdx_emulate_mmio(). That's how read_prepare() and read_exit_mmio() behaves.

For MMIO read
- When kernel can handle the MMIO, KVM_TRACE_MMIO_READ with data.
- When exiting to the user space, KVM_TRACE_MMIO_READ_UNSATISFIED before
the exit. No trace after the user space handled the MMIO.

For MMIO read, in the emulator, there is still a trace after the userspace handled the MMIO.
In complete_emulated_mmio(), if all fragments have been handled, it will
set vcpu->mmio_read_completed to 1 and call complete_emulated_io().
complete_emulated_io
    kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE)
        x86_emulate_instruction
            x86_emulate_insn
                emulator_read_write
                    read_prepare
                        At this point, vcpu->mmio_read_completed is 1,
                        it traces KVM_TRACE_MMIO_READ with data
                        and then clear vcpu->mmio_read_completed

So to align with emulator, we should keep the trace for KVM_TRACE_MMIO_READ.



For MMIO write
- KVM_TRACE_MMIO_WRITE before handling it.


Also, the return code is only set when the emulation is done in kernel, but
not set when it's handled in userspace.

+ return 1;
+}
How about the fixup as following:

@@ -1173,19 +1173,18 @@ static int tdx_emulate_io(struct kvm_vcpu *vcpu)
static int tdx_complete_mmio(struct kvm_vcpu *vcpu) { unsigned long val = 0;
- gpa_t gpa; - int size; - - vcpu->mmio_needed = 0; - - if
(!vcpu->mmio_is_write) { - gpa = vcpu->mmio_fragments[0].gpa; - size =
vcpu->mmio_fragments[0].len; + gpa_t gpa = vcpu->run->mmio.phys_addr; + int
size = vcpu->run->mmio.len; + if (vcpu->run->mmio.is_write) { +
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val); + } else {
memcpy(&val, vcpu->run->mmio.data, size); tdvmcall_set_return_val(vcpu,
val); trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val); } + +
tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); return 1; }



+
+static inline int tdx_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, int size,
+ unsigned long val)
+{
+ if (kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+ kvm_io_bus_write(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+ return -EOPNOTSUPP;
+
+ trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, size, gpa, &val);
+ return 0;
+}
+
+static inline int tdx_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, int size)
+{
+ unsigned long val;
+
+ if (kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, gpa, size, &val) &&
+ kvm_io_bus_read(vcpu, KVM_MMIO_BUS, gpa, size, &val))
+ return -EOPNOTSUPP;
+
+ tdvmcall_set_return_val(vcpu, val);
+ trace_kvm_mmio(KVM_TRACE_MMIO_READ, size, gpa, &val);
+ return 0;
+}
+
+static int tdx_emulate_mmio(struct kvm_vcpu *vcpu)
+{
+ struct kvm_memory_slot *slot;
+ int size, write, r;
+ unsigned long val;
+ gpa_t gpa;
+
+ KVM_BUG_ON(vcpu->mmio_needed, vcpu->kvm);
+
[...]
+
+ /* Request the device emulation to userspace device model. */
+ vcpu->mmio_needed = 1;
+ vcpu->mmio_is_write = write;
Then they can be dropped.
We may drop mmio_needed. mmio_is_write is needed as above.



+ vcpu->arch.complete_userspace_io = tdx_complete_mmio;
+
+ vcpu->run->mmio.phys_addr = gpa;
+ vcpu->run->mmio.len = size;
+ vcpu->run->mmio.is_write = write;
+ vcpu->run->exit_reason = KVM_EXIT_MMIO;
+
+ if (write) {
+ memcpy(vcpu->run->mmio.data, &val, size);
+ } else {
+ vcpu->mmio_fragments[0].gpa = gpa;
+ vcpu->mmio_fragments[0].len = size;
These two lines can be dropped as well.
ditto.