On 07/24/2012 01:09 PM, Vladimir Davydov wrote:On 07/24/2012 02:10 PM, Borislav Petkov wrote:You have the new feature leaf at EAX=7. This contains things like BMIOn Tue, Jul 24, 2012 at 12:29:19PM +0400, Vladimir Davydov wrote:We have not encountered this situation in our environments and I hope weI guess that when the more advanced features become widely-used,And this right there is the dealbreaker:
vendors will offer new MSRs and/or CPUID faulting.
So what are you doing for cpus which have the advanced CPUID features
leafs but there are no MSRs to turn those bits off?
won't :-)
But look, these CPUID functions cover majority of CPU features, don't
they? So, most of "normal" apps inside VM will survive migration.
Perhaps, some low-level utils won't. I guess that's why there are no
MSRs for other levels provided by vendors.
and AVX2 and probably more upcoming features.
So you may be safe for a while, but you need a solution in the long run.
So for this single kernel approach I'd understand it that way:You surely need some software-only solution for the migration to work,Yes.
no?
If so, why not apply that solution to your hypervisor without touchingIn most hypervisor-based virtualization products, this is already
the kernel at all?
implemented using VMM-exits, so that each VM can have arbitrary CPUID
mask set by the admin.
The problem is that we have no hypervisor. "Virtualization" we want this
feature for is based on cgroups and namespaces (examples are OpenVZ and
mainstream LXC). Tasks are just grouped into virtual environments and
share the same kernel, which is proved to be more memory usage efficient
than traditional hypervisor-based approaches.
1. You boot up the kernel on the host, it should detect and enable all
the features, say MCA.
2. After boot, you use /src/msr-tools/wrmsr to mask CPUID bits, again
MCA for instance or AVX/AES or the like.
Since the (host side of the) kernel already detected it, this does not
hurt the kernel features like MCA. But AVX will not be available to
applications running in the "host container", which is probably OK since
these are mostly management applications, right?
3. Then you start guests. The guest's libc will not detect the features
because of the MSR masking. All you need now is /proc/cpuinfo filtering
to make this bullet-proof, preferably through the container
functionality. I see that you do already massive sysfs filtering and
also /proc/<pid> filtering, so this maybe an option?
This approach does not need any kernel support (except for the
/proc/cpuinfo filtering). Does this address the issues you have?
Regards,
Andre.