Re: Efficient x86 and x86_64 NOP microbenchmarks

From: Andi Kleen
Date: Wed Aug 13 2008 - 15:36:19 EST


> Sorry to ask, I feel I must be missing something, but I'm trying to
> figure out where you propose to add the "call mcount" ? In the caller or
> in the callee ?

callee like gcc. caller would be likely more bloated because
there are more calls than functions. Also if it was at the
callee more code would be needed because the function currently
executed couldn't be gotten from stack directly.

> Or is it a different scheme I don't see ? I am trying to figure out how
> you happen to do all that without dynamic code modification and manage
> not to hurt performance.

The dynamic code modification is only needed because there is no
global table of the mcount call sites. So instead it discovers
them at runtime, but that requires runtime save patching

With a custom call scheme one could just build up a table of
call sites at link time using an ELF section and then when
tracing is enabled/disabled always patch them all in one go
in a stop_machine(). Then you wouldn't need parallel execution safe
patching anymore and it doesn't matter what the nops look like.

The other advantage is that it would allow getting rid of
the frame pointer.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/