On Thu, 4 Mar 2021 11:01:37 +0200
"Yordan Karadzhov (VMware)" <y.karadz@xxxxxxxxx> wrote:
Thanks Yordan for doing this!
I have some comments below.
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 4547ac59da61..8007f9b6417f 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
__entry->nmi_total_ts,
__entry->nmi_count)
);
+
+FTRACE_ENTRY(func_repeats, func_repeats_entry,
+
+ TRACE_FUNC_REPEATS,
+
+ F_STRUCT(
+ __field( unsigned long, ip )
+ __field( unsigned long, pip )
+ __field( unsigned long, count )
+ ),
+
+ F_printk(" %ps <-%ps\t(repeats:%lu)",
+ (void *)__entry->ip,
+ (void *)__entry->pip,
+ __entry->count)
After playing with this a little, I realized that we should also store the
last timestamp as well. I think that would be interesting information.
<...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work
<...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker
<...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (repeats: 127)
<...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work
The above shows that __might_sleep() was called 128 times, but what I don't
get from the above, is when that last call was made. You'll see that the
timestamp for the repeat output is the same as the next function shown
(queue_delayed_work_on()). But the timestamp for the last call to
__might_sleep() is lost, and the repeat event ends up being written when
it is detected that there are no more repeats.
If we had:
<...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work
<...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker
<...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127)
<...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work
We would know the last time __might_sleep was called.
That is, not only should we save the ip and pip in the trace_func_repeats
structure, but we should also be storing the last time stamp of the last
function event that repeated. Otherwise the above looks like the last
__might_sleep called above happened when the queue_delayed_work_on
happened, where that may not be the case.
-- Steve