Peter, do you have something like the following code in mind ?
I think the main differences between the code snippet down here and the
markers is that markers rely on the compiler to generate the stack
setup, and have this code a little bit closer to the function than what
I propose here, where I put the stack setup code in a "farfaraway"
section. Moreover, markers are much simpler than what I show here.
And actually, markers can be deployed portably, with
architecture-specific optimizations refined later. This has to be
implemented all up front for any traced architecture. In addition,
dealing with weird types like unsigned long long can become a pain.
Also, due to fact that we are asking the compiler to put keep some
variables live in registers, I would be tempted to embed this in a block
controlled by an if() statement (conditional branch, like I use for the
markers) so we don't have to pay the penality of populating the
registers when not required if there are not live at the marker site.