Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
From: H. Peter Anvin
Date: Thu Jan 22 2009 - 18:55:52 EST
Zachary Amsden wrote:
On Thu, 2009-01-22 at 14:49 -0800, H. Peter Anvin wrote:
There is also the option to use assembly wrappers to avoid relying on
the calling convention. This is particularly so since we have sites
where as little as a two-byte instruction gets bloated up with huge
push/pop sequences around a tiny instruction. Those would be better
served with a direct call to a stub (5 bytes), which would be repatched
to the two-byte instruction + 3 byte nop.
Yes, for known trivial ops (most!), there isn't any reason to ever have
a call to begin with; simply an inline instruction sequence would be
fine, and only those callers that override the sequence would need to
patch. It's possible to write clever macros to assure there is always
space for a 5 byte call.
It's functionally speaking the same thing... the advantage with starting
out with the call and then patch in the native code as opposed to the
other way around is to be able to handle things properly before we're
ready to run the patching code.
Right now a number of the call sites contain a huge push/pop sequence
followed by an indirect call. We can patch in the native code to avoid
the branch overhead, but the register constraints and icache footprint
is unchanged.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/