Re: [PATCH] Add MCE support to KVM

From: Avi Kivity
Date: Mon Apr 20 2009 - 09:45:31 EST


Gerd Hoffmann wrote:
On 04/20/09 14:43, Avi Kivity wrote:
Gerd Hoffmann wrote:
That said, I'd like to be able to emulate the Xen HVM hypercalls. But in
any case, they hypercall implementation has to be in the kernel,

No. With Xenner the xen hypercall emulation code lives in guest
address space.

In this case the guest ring-0 code should trap the #GP, and install the
hypercall page (which uses sysenter/syscall?). No kvm or qemu changes
needed.

Doesn't fly.

Reason #1: In the pv-on-hvm case the guest runs on ring0.

Sure, in this case you need to trap the MSR in the kernel (or qemu). But the handler is no longer in the guest address space, and you do need to update the opcode.

Let's not confuse the two cases.

Reason #2: Chicken-egg issue: For the pv-on-hvm case only few,
simple hypercalls are needed. The code to handle them
is small enougth that it can be loaded directly into the
hypercall page(s).

Please elaborate. What hypercalls are so simple that an exit into the hypervisor is not necessary?

Is there any reason to? I *think* xen does it for better scheduling
latency. But with xen emulation sitting in guest address space we can
schedule the guest at will anyway.

It also improves latency within the guest itself. At least I think that
what was the Hyper-V spec is saying. You can interrupt the execution of
a long hypercall, inject and interrupt, and resume. Sort of like a
rep/movs instruction, which the cpu can and will interrupt.

Hmm. Needs investigation.. I'd expect the main source of latencies is page table walking. Xen works very different from kvm+xenner here ...

kvm is mostly O(1). We need to limit rmap chains, but we're fairly close. The kvm paravirt mmu calls are not O(1), but we can easily use continuations there (and they're disabled on newer processors anyway).

Another area that worries me is virtio notification, which can take a long time. It won't be trivial, but we can make work:

- for the existing pio-to-userspace notification, add a bit that tells the kernel to repeat the instruction instead of continuing. the 'outl' instruction is idempotent, so we can do partial work, and return to the kernel.
- if using hypercallfd/piofd to a pipe, we're offloading everything to another thread anyway, so we can return immediately
- if using hypercallfd/piofd to a kernel virtio server, it can return 0 bytes written, indicating it needs a retry. kvm can try to inject an interrupt if it sees this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/