I hear (er, read) you. Emulation may turn out to be the answer for some
architectures. But here are some things to keep in mind about the
various approaches:
1. Single-stepping inline is easiest: you need to know very little about
the instruction set you're probing. But it's inadequate for
multithreaded apps.
2. Single-stepping out of line solves the multithreading issue (as do #3
and #4), but requires more knowledge of the instruction set. (In
particular, calls, jumps, and returns need special care; as do
rip-relative instructions in x86_64.) I count 9 architectures that
support kprobes. I think most of these do SSOL.
3. "Boosted" probes (where an appended jump instruction removes the need
for the single-step trap on many instructions) require even more
knowledge of the instruction set, and like SSOL, require XOL slots.
Right now, as far as I know, x86 is the only architecture with boosted
kprobes.
4. Emulation removes the need for the XOL area, but requires pretty much
total knowledge of the instruction set. It's also a performance win for
architectures that can't do #3. I see kvm implemented on 4
architectures (ia64, powerpc, s390, x86). Coincidentally, those are the
architectures to which uprobes (old uprobes, with ubp and xol bundled
in) has already been ported (though Intel hasn't been maintaining their
ia64 port). So it sort of comes down to how objectionable the XOL vma
(or page) really is.