Re: ABI coupling to hypervisors via CONFIG_PARAVIRT
From: Zachary Amsden
Date: Fri Mar 09 2007 - 18:07:53 EST
Linus Torvalds wrote:
but ... maybe because VMI is so lowlevel and covers /all/ of x86 today,
it will always be able to emulate whatever different concept we can come
up with? Do we really know this absolutely sure?
"For sure"? Absolutely not. But since any new interfaces we come up with
for doing timers etc had better work perfectly fine on an old hardware
platform too, we can't exactly require any interfaces that do things that
a bog-standard old dual-PPro didn't do 10 years ago, can we?
So assumign that the VMI interface is roughly as powerful (by virtue of
basically emulating it) as the old single-ioapic/single-lapic systems we
used to use, I don't think it should ever be a real problem. Hmm?
Sorry to keep you in a thread you don't want more to do with Linus, but
to answer the question completely directly:
There are four design requirement which are inviolate for achieving a
large measurable performance gain, which is the primary benefit of i386
paravirt-ops for us. These changes are required for /ALL/ high
performance paravirtualized kernels not running in hardware
virtualization, across a broad spectrum of hypervisors, and have either
zero negative impact or opportunity for additional gain when hardware
virtualization is enabled.
1) Ability to run the kernel at non-zero CPL
2) Ability to replace hardware interrupt masking functions with
virtualizable equivalents
3) Notification when page tables are allocated and released
4) Notification in some form when page table entries are updated (or
vma's are changed)
Everything after this are incremental gains, some more valuable than
others, but not as major in significance as the above four (apic_write,
incidentally, _is_ one of the more substantial gains for us).
These don't seem to be a major burden on the kernel at all.
#1 is already the default case now for even native hardware.
#2 requires a lot of hooks because interrupt masking is a common
function, and this is where the large numbers of hook points Ingo was
demonstrating came from, but these icache effects and costs are on
already expensive instructions. In fact, it appears on some hardware,
the nop padding around cli / sti / pushf / popf contributes to a
mysterious performance gain, perhaps due to some pipelining anomaly.
#3 is not a common enough operation to be of performance concern. It
does however, require pagetables, just as native hardware does. Which
we can implement perfectly well anyway in our backend, just as the
native backend would if some reckless madman removed all notions of page
tables from a paravirt kernel.
#4 involves an extra call in page fault paths and from some points in
the mm layer.
There are no ABI requirements tied to these, merely the presence of any
usable API for them in paravirt-ops.
Linus is right - our virtual hardware is an exact replica of real
hardware. So no matter how you change paravirtualization in the kernel,
anything that runs on real hardware will continue to run on VMware. VMI
is tied very closely to the hardware, on purpose, and follows the rules
of native hardware extremely closely. So you can pretty much twist and
abuse paravirt-ops in a number of ways, and as long as it continues to
run on real hardware with the above four requirements, it still runs
even on VMI. Violate the above four requirements, and it costs a lot of
performance, but we still continue to run.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/