[RFC] perf: need to expose sched_clock to correlate user samples withkernel samples

From: Stephane Eranian
Date: Tue Oct 16 2012 - 06:13:33 EST


There are many situations where we want to correlate events happening at
the user level with samples recorded in the perf_event kernel sampling buffer.
For instance, we might want to correlate the call to a function or creation of
a file with samples. Similarly, when we want to monitor a JVM with jitted code,
we need to be able to correlate jitted code mappings with perf event samples
for symbolization.

Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
That causes each PERF_RECORD_SAMPLE to include a timestamp
generated by calling the local_clock() -> sched_clock_cpu() function.

To make correlating user vs. kernel samples easy, we would need to
access that sched_clock() functionality. However, none of the existing
clock calls permit this at this point. They all return timestamps which are
not using the same source and/or offset as sched_clock.

I believe a similar issue exists with the ftrace subsystem.

The problem needs to be adressed in a portable manner. Solutions
based on reading TSC for the user level to reconstruct sched_clock()
don't seem appropriate to me.

One possibility to address this limitation would be to extend clock_gettime()
with a new clock time, e.g., CLOCK_PERF.

However, I understand that sched_clock_cpu() provides ordering guarantees only
when invoked on the same CPU repeatedly, i.e., it's not globally synchronized.
But we already have to deal with this problem when merging samples obtained
from different CPU sampling buffer in per-thread mode. So this is not
a showstopper.

Alternatives could be to use uprobes but that's less practical to setup.

Anyone with better ideas?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/