I would also like to point out that maintaining a _separated_ piece of
code for each instrumentation site which would heavily depend on the
inner kernel data structures seems like a maintenance nightmare.
I would be happy with a solution that doesn't depend on this gigantic
DWARF information and can be included in the kernel build process. See,
I think tracing is, primarily, a facility that the kernel should provide
to users so they can tune and find problems in their own applications.
From this POV, it would make sense to consider tracing as part of the
kernel code itself, not as a separated, kernel debugging oriented piece
of code. If you require per-site dynamic pieces of code, you are only
adding to the complexity of such a tracer. Actually, an active tracer
would trash the i-cache quite heavily due to these per-site pieces of
code. Given that users want a tracer that disturbs as little as
possible the normal system behavior, I don't think this "per-site"
pieces of code approach is that good.
Instruction cache bloat inspection :
If a code region is placed with cache cold instructions (unlikely
branches), it should not increase the cache impact, since although we
might use one more cache line, it won't be often loaded in cache because
all the code that shares this cache line is unlikely.
I therefore think that looking only at code size is misleading when
considering the cache impact of markers, since they have been designed
to put the bytes as far away as possible from cache-hot memory.