Re: [PATCH v4] kvm: better MWAIT emulation for guests

From: Gabriel L. Somlo
Date: Wed Mar 15 2017 - 16:29:21 EST


On Wed, Mar 15, 2017 at 09:13:49PM +0100, Radim KrÄmÃÅ wrote:
> 2017-03-15 21:28+0200, Michael S. Tsirkin:
> > Guests running Mac OS 5, 6, and 7 (Leopard through Lion) have a problem:
> > unless explicitly provided with kernel command line argument
> > "idlehalt=0" they'd implicitly assume MONITOR and MWAIT availability,
> > without checking CPUID.
> >
> > We currently emulate that as a NOP but on VMX we can do better: let
> > guest stop the CPU until timer, IPI or memory change. CPU will be busy
> > but that isn't any worse than a NOP emulation.
> >
> > Note that mwait within guests is not the same as on real hardware
> > because halt causes an exit while mwait doesn't. For this reason it
> > might not be a good idea to use the regular MWAIT flag in CPUID to
> > signal this capability. Add a flag in the hypervisor leaf instead.
> >
> > Additionally, we add a capability for QEMU - e.g. if it knows there's an
> > isolated CPU dedicated for the VCPU it can set the standard MWAIT flag
> > to improve guest behaviour.
> >
> > Reported-by: "Gabriel L. Somlo" <gsomlo@xxxxxxxxx>
> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > ---
> >
> > Note: SVM bits are untested at this point. Seems pretty
> > obvious though.
> >
> > changes from v3:
> > - don't enable capability if cli+mwait blocks interrupts
> > - doc typo fixes (drop drop ppc doc)
> >
> > changes from v2:
> > - add a capability to allow host userspace to detect new kernels
> > - more documentation to clarify the semantics of the feature flag
> > and why it's useful
> > - svm support as suggested by Radim
> >
> > changes from v1:
> > - typo fix resulting in rest of leaf flags being overwritten
> > Reported by: Wanpeng Li <kernellwp@xxxxxxxxx>
> > - updated commit log with data about guests helped by this feature
> > - better document differences between mwait and halt for guests
> >
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > @@ -212,4 +213,28 @@ static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
> > __rem; \
> > })
> >
> > +static bool kvm_mwait_in_guest(void)
> > +{
> > + unsigned int eax, ebx, ecx;
> > +
> > + if (!cpu_has(&boot_cpu_data, X86_FEATURE_MWAIT))
> > + return -ENODEV;
> > +
> > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > + return -ENODEV;
> > +
> > + /*
> > + * Intel CPUs without CPUID5_ECX_INTERRUPT_BREAK are problematic as
> > + * they would allow guest to stop the CPU completely by disabling
> > + * interrupts then invoking MWAIT.
> > + */
> > + if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
> > + return -ENODEV;
> > +
> > + cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &mwait_substates);
> > +
> > + if (!(ecx & CPUID5_ECX_INTERRUPT_BREAK))
> > + return -ENODEV;
>
> The guest is still able to set ecx=0 with MWAIT, which should be the
> same as not having the CPUID flag, so I'm wondering how this check
> prevents anything harmful ... is it really a cpu "feature"?
>
> If we somehow report ecx bit 1 in CPUID[5], then the guest might try to
> set ecx bit 0 for MWAIT, which will cause #GP(0) and could explain the
> hang that Gabriel is hitting.
>
> Gabriel,
>
> - do you see VM exits on the "hung" VCPU?

how would I go about looking ?

> - what is your CPU model?

$ cat /proc/cpuinfo
...
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
stepping : 6
microcode : 0xd2
cpu MHz : 2659.966
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 2
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm
bugs :
bogomips : 5320.04
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

(this is 2x dual-core Xeon on a Mac Pro 1,1 -- all I had to spare for
testing, to avoid having to reboot my primary desktop :)

> - what do you get after running this C program on host and guest?
>
> #include <stdint.h>
> #include <stdio.h>
>
> int main(void) {
> uint32_t eax = 5, ebx, ecx = 0, edx;
> asm ("cpuid" : "+a"(eax), "=b"(ebx), "+c"(ecx), "=d"(edx));
>
> printf("eax=%#08x ebx=%#08x ecx=%#08x edx=%#08x\n", eax, ebx, ecx, edx);
>
> return 0;
> }

eax=0x000040 ebx=0x000040 ecx=0x000003 edx=0x000020

HTH,
--G