Re: [visorchipset] invalid opcode: 0000 [#1] PREEMPT SMP

From: Romer, Benjamin M
Date: Fri Apr 11 2014 - 09:52:01 EST


On Thu, 2014-04-10 at 19:28 -0700, H. Peter Anvin wrote:
> On 04/10/2014 06:19 AM, Romer, Benjamin M wrote:
> >
> > I'm confused by the intended behavior of KVM.. Is the intention of the
> > -cpu switch to fully emulate a particular CPU? If that's the case, the
> > Intel documentation says bit 31 should always be 0, so the value
> > returned by the cpuid instruction isn't correct. If the intention is to
> > present a VM with a specific CPU architecture, the CPU ought to behave
> > as described in Intel's virtualization documentation and just vmexit
> > instead of faulting with invalid op, IMHO.
> >
> > I've already said the check in the code was insufficient, and I'm trying
> > to fix that part now. :)
> >
>
> I'm still confused where KVM comes into the picture. Are you actually
> using KVM (and thus talking about nested virtualization) or are you
> using Qemu in JIT mode and running another hypervisor underneath?

The test that Fengguang used to find the problem was running the linux
kernel directly using KVM. When the kernel was run with "-cpu Haswell,
+smep,+smap" set, the vmcall failed with invalid op, but when the kernel
is run with "-cpu qemu64", the vmcall causes a vmexit, as it should.

My point is, the vmcall was made because the hypervisor bit was set. If
this bit had been turned off, as it would be on a real processor, the
vmcall wouldn't have happened.

> The hypervisor bit is a complete red herring. If the guest CPU is
> running in VT-x mode, then VMCALL should VMEXIT inside the guest
> (invoking the guest root VT-x),

The CPU is running in VT-X. That was my point, the kernel is running in
the KVM guest, and KVM is setting the CPU feature bits such that bit 31
is enabled.

I don't think it's a red herring because the kernel uses this bit
elsewhere - it is reported as X86_FEATURE_HYPERVISOR in the CPU
features, and can be checked with the cpu_has_hypervisor macro (which
was not used by the original author of the code in the driver, but
should have been). VMWare and KVM support in the kernel also check for
this bit before checking their hypervisor leaves for an ID. If it's not
properly set it affects more than just the s-Par drivers.

> but the fact still remains that you
> should never, ever, invoke VMCALL unless you know what hypervisor you
> have underneath.

From the standpoint of the s-Par drivers, yes, I agree (as I already
said). However, VMCALL is not a privileged instruction, so anyone could
use it from user space and go right past the OS straight to the
hypervisor. IMHO, making it *lethal* to the guest is a bad idea, since
any user could hard-stop the guest with a couple of lines of C.

-- Ben