Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

From: Steven Rostedt
Date: Thu Mar 04 2021 - 11:39:48 EST


On Thu, 4 Mar 2021 11:01:37 +0200
"Yordan Karadzhov (VMware)" <y.karadz@xxxxxxxxx> wrote:

Thanks Yordan for doing this!

I have some comments below.

> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index 4547ac59da61..8007f9b6417f 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
> __entry->nmi_total_ts,
> __entry->nmi_count)
> );
> +
> +FTRACE_ENTRY(func_repeats, func_repeats_entry,
> +
> + TRACE_FUNC_REPEATS,
> +
> + F_STRUCT(
> + __field( unsigned long, ip )
> + __field( unsigned long, pip )
> + __field( unsigned long, count )
> + ),
> +
> + F_printk(" %ps <-%ps\t(repeats:%lu)",
> + (void *)__entry->ip,
> + (void *)__entry->pip,
> + __entry->count)

After playing with this a little, I realized that we should also store the
last timestamp as well. I think that would be interesting information.

<...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work
<...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker
<...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (repeats: 127)
<...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work

The above shows that __might_sleep() was called 128 times, but what I don't
get from the above, is when that last call was made. You'll see that the
timestamp for the repeat output is the same as the next function shown
(queue_delayed_work_on()). But the timestamp for the last call to
__might_sleep() is lost, and the repeat event ends up being written when
it is detected that there are no more repeats.

If we had:

<...>-37 [004] ...1 2022.303820: gc_worker <-process_one_work
<...>-37 [004] ...1 2022.303820: ___might_sleep <-gc_worker
<...>-37 [004] ...1 2022.303831: ___might_sleep <-gc_worker (last ts: 2022.303828 repeats: 127)
<...>-37 [004] ...1 2022.303831: queue_delayed_work_on <-process_one_work

We would know the last time __might_sleep was called.

That is, not only should we save the ip and pip in the trace_func_repeats
structure, but we should also be storing the last time stamp of the last
function event that repeated. Otherwise the above looks like the last
__might_sleep called above happened when the queue_delayed_work_on
happened, where that may not be the case.

-- Steve