Re: ABI coupling to hypervisors via CONFIG_PARAVIRT

From: Zachary Amsden
Date: Fri Mar 09 2007 - 18:07:53 EST

Next message: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Previous message: Kok, Auke: "Re: SATA resume slowness, e1000 MSI warning"
In reply to: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Next in thread: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Linus Torvalds wrote:

but ... maybe because VMI is so lowlevel and covers /all/ of x86 today, it will always be able to emulate whatever different concept we can come up with? Do we really know this absolutely sure?

"For sure"? Absolutely not. But since any new interfaces we come up with for doing timers etc had better work perfectly fine on an old hardware platform too, we can't exactly require any interfaces that do things that a bog-standard old dual-PPro didn't do 10 years ago, can we?

So assumign that the VMI interface is roughly as powerful (by virtue of basically emulating it) as the old single-ioapic/single-lapic systems we used to use, I don't think it should ever be a real problem. Hmm?

Sorry to keep you in a thread you don't want more to do with Linus, but to answer the question completely directly:

There are four design requirement which are inviolate for achieving a large measurable performance gain, which is the primary benefit of i386 paravirt-ops for us. These changes are required for /ALL/ high performance paravirtualized kernels not running in hardware virtualization, across a broad spectrum of hypervisors, and have either zero negative impact or opportunity for additional gain when hardware virtualization is enabled.

1) Ability to run the kernel at non-zero CPL
2) Ability to replace hardware interrupt masking functions with virtualizable equivalents
3) Notification when page tables are allocated and released
4) Notification in some form when page table entries are updated (or vma's are changed)

Everything after this are incremental gains, some more valuable than others, but not as major in significance as the above four (apic_write, incidentally, _is_ one of the more substantial gains for us).

These don't seem to be a major burden on the kernel at all.

#1 is already the default case now for even native hardware.
#2 requires a lot of hooks because interrupt masking is a common function, and this is where the large numbers of hook points Ingo was demonstrating came from, but these icache effects and costs are on already expensive instructions. In fact, it appears on some hardware, the nop padding around cli / sti / pushf / popf contributes to a mysterious performance gain, perhaps due to some pipelining anomaly.
#3 is not a common enough operation to be of performance concern. It does however, require pagetables, just as native hardware does. Which we can implement perfectly well anyway in our backend, just as the native backend would if some reckless madman removed all notions of page tables from a paravirt kernel.
#4 involves an extra call in page fault paths and from some points in the mm layer.

There are no ABI requirements tied to these, merely the presence of any usable API for them in paravirt-ops.

Linus is right - our virtual hardware is an exact replica of real hardware. So no matter how you change paravirtualization in the kernel, anything that runs on real hardware will continue to run on VMware. VMI is tied very closely to the hardware, on purpose, and follows the rules of native hardware extremely closely. So you can pretty much twist and abuse paravirt-ops in a number of ways, and as long as it continues to run on real hardware with the above four requirements, it still runs even on VMI. Violate the above four requirements, and it costs a lot of performance, but we still continue to run.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Previous message: Kok, Auke: "Re: SATA resume slowness, e1000 MSI warning"
In reply to: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Next in thread: Ingo Molnar: "Re: ABI coupling to hypervisors via CONFIG_PARAVIRT"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]