Re: [PATCH] Add I/O hypercalls for i386 paravirt

From: Zachary Amsden
Date: Wed Aug 22 2007 - 16:48:57 EST


Andi Kleen wrote:

How is that measured? In a loop? In the same pipeline state?

It seems a little dubious to me.

I did the experiments in a controlled environment, with interrupts disabled and care to get the pipeline in the same state. It was a perfectly repeatable experiment. I don't have exact cycle time anymore, but they were the tightest measurements I've even seen on cycle counts because of the unique nature of serializing the processor for the fault / privilege transition. I tested a variety of different conditions, including different types of #GP (yes, the cost does vary), #NP, #PF, sysenter, int $0xxx. Sysenter was the fastest, by far. Int was about 5x the cost. #GP and friends were all about similar costs. #PF was the most expensive.


to verify protection in the page tables mapping the page allows execution (P, !NX, and U/S check). This is a lot more expensive than a
When the page is not executable or not present you get #PF not #GP. So the hardware already checks that.

The only case where you would need to check yourself is if you emulate
NX on non NX capable hardware, but I can't see you doing that.
No, it doesn't. Between the #GP and decode, you have an SMP race where another processor can rewrite the instruction.

That can be ignored imho. If the page goes away you'll notice
when you handle the page fault on read. If it becomes NX then the execution
just happened to be logically a little earlier.


No, you can't ignore it. The page protections won't change between the GP and the decoder execution, but the instruction can, causing you to decode into the next page where the processor would not have. !P becomes obvious, but failure to respect NX or U/S is an exploitable bug. Put a 1 byte instruction at the end of a page crossing into a NX (or supervisor page). Remotely, change keep switching between the instruction and a segment override.

Result: user executes instruction on supervisor code page, learning data as a result of this; code on NX page gets executed.

Or easier to just write a backend for the lguest virtio drivers,
that will be likely faster in the end anyways than this gross
hack.

We already have drivers for all of our hardware in Linux. Most of the hardware we emulate is physical hardware, and there are no virtual drivers for it. Should we take the BusLogic driver and "paravirtualize" it by adding VMI hypercalls? We might benefit from it, but would the BusLogic driver? It sets a nasty precedent for maintenance as different hypervisors and emulators hack up different drivers for their own performance.

Our SCSI and IDE emulation and thus the drivers used by Linux are pretty much fixed in stone; we are not going to go about changing a tricky hardware interface to a virtual one, it is simply too risky for something as critical as storage. We might be able to move our network driver over to virtio, but that is not a short-term prospect either.

There is great advantage in talking to our existing device layer faster, and this is something that is valuable today.

Really LinuxHAL^wparavirt ops is already so complicated that
any new hooks need an extremly good justification and that is
just not here for this.

We can add it if you find an equivalent number of hooks
to eliminate.

Interesting trade. What if I sanitized the whole I/O messy macros into something fun and friendly:

native_port_in(int port, iosize_t opsize, int delay)
native_port_out(int port, iosize_t opsize, u32 output, int delay)
native_port_string_in(int port, void *ptr, iosize_t opsize, unsigned count, int delay)
native_port_string_out(int port, void *ptr, iosize_t opsize, unsigned count, int delay)

Then we can be rid of all the macro goo in io.h, which frightens my mother. We might even be able to get rid of the umpteen different places where drivers wrap iospace access with their own byte / word / long functions so they can switch between port I/O and memory mapped I/O by moving it all into common infrastructure.

We could make similar (unwelcome?) advances on the pte functions if it were not for the regrettable disconnect between pte_high / pte_low and the rest. Perhaps if it was hidden in macros?

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/