[RFC PATCH 0/2] Tracing bursts of latencies

From: Viktor Rosendahl
Date: Tue Jan 19 2021 - 11:53:21 EST


Hello all,

This series contains two things:

1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7.

2. The latency-collector, a tool that is designed to work around the
limitations in the ftrace latency tracers. It needs the bug fix in order to
work properly.

I have sent a patch series with the latency-collector before.

I never got any comments on it and I stopped pushing it because I thought that
BPF tracing would be the wave of the future and that it would solve the problem
in a cleaner and more elegant way.

Recently, I tried out the criticalstat script from bcc tools but it did not
fulfill all of my hopes and dreams.

On the bright side, it was able to capture all latencies in a burst. The main
problems that I encountered were:

1. The system became unstable and froze now and then. The man page of
criticalstat has a mention of it being unstable, so I assume that this is a
known problem.

2. Sometimes the stack traces were incorrect but not in an obvious way. After it
happened once, all subsequent ones were bad.

3. If two instances were run simultaneously (to capture both preemptoff and irq
off), there seemed to be a quite large performance hit but I did not measure
this exactly.

4. The filesystem footprint seemed quite large. The size of libbcc seemed to be
quite large for a small embedded system.

For these reasons, I take the liberty of resending the latency-collector again.

I would hope to get some comments regarding it, or some suggestion of an
alternative approach of how to solve the problem of being able to capture
latencies that systematically occur close to each other.

Admittedly, it may from a developer's perspective be somewhat of a niche
problem, since removing one latency will reveal the next but when one is doing
validation with a fleet of devices being tested in a long and expensive test
campaign, then it is quite desirable to not lose any latencies.

best regards,

Viktor

Viktor Rosendahl (2):
Use pause-on-trace with the latency tracers
Add the latency-collector to tools

kernel/trace/trace_irqsoff.c | 4 +
tools/Makefile | 14 +-
tools/tracing/Makefile | 20 +
tools/tracing/latency-collector.c | 1212 +++++++++++++++++++++++++++++
4 files changed, 1244 insertions(+), 6 deletions(-)
create mode 100644 tools/tracing/Makefile
create mode 100644 tools/tracing/latency-collector.c

--
2.25.1