Agreed on remote callchains and maintaining consistency about what the
tracepoints mean.
As I said on the other thread, post-processing in userspace has the
issue that we collect more info than we actually need and under load,
perf record can't keep up.
Attached is an alternative approach that does what you allude to above.
perf record -agPe sched:sched_switch --filter "delay > 1000000" -- sleep 1
allows us to collect a lot less. For some reason, "perf script" shows
the correct delay field, but the sample period still contains 1 (i.e
__perf_count() hint is not working for me).
-Arun
+#ifdef CONFIG_SCHEDSTATSThe previous code is hard to read...
+ __entry->delay = next->se.statistics.block_start ? next->se.statistics.block_start
+ : next->se.statistics.sleep_start ? next->se.statistics.sleep_start : 0;
+ __entry->delay = __entry->delay ? now - __entry->delay : 0;next->se.statistics.{block,sleep}_start should be zeroized here, otherwise a next sched_switch will report non-zero delay again.
+#else
+ __entry->delay = 0;
+#endif
+ )--