[PATCH v2 0/4] tracing: Add Hardware Latency detector tracer
From: Steven Rostedt
Date: Wed Aug 10 2016 - 14:51:38 EST
This adds the PREEMPT_RT hwlat detector as a Linux tracer in mainline.
In the PREEMPT_RT patch set, it is a separate entity that is controlled
by the debugfs file system. I found that it is better suited as a
latency tracer in the tracing directory, as it follows pretty much the
same paradigm as the other latency tracers that already exist. All
I had to add was a hwlat_detector directory that contained a window
and width for the period and duration respectively of the test. But
the samples would just write to the tracing ring buffer and the max
latency would be stored in tracing_max_latency, and the threshold can
be set by the existing tracing_threshold. The last patch also adds a
new feature that would have the kthread migrate after each period to
another CPU specified by tracing_cpumask.
Here's the documention that is added to hwlat_detector.txt in the
Documentation directory:
Introduction:
-------------
The tracer hwlat_detector is a special purpose tracer that is used to
detect large system latencies induced by the behavior of certain underlying
hardware or firmware, independent of Linux itself. The code was developed
originally to detect SMIs (System Management Interrupts) on x86 systems,
however there is nothing x86 specific about this patchset. It was
originally written for use by the "RT" patch since the Real Time
kernel is highly latency sensitive.
SMIs are not serviced by the Linux kernel, which means that it does not
even know that they are occuring. SMIs are instead set up by BIOS code
and are serviced by BIOS code, usually for "critical" events such as
management of thermal sensors and fans. Sometimes though, SMIs are used for
other tasks and those tasks can spend an inordinate amount of time in the
handler (sometimes measured in milliseconds). Obviously this is a problem if
you are trying to keep event service latencies down in the microsecond range.
The hardware latency detector works by hogging one of the cpus for configurable
amounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
for some period, then looking for gaps in the TSC data. Any gap indicates a
time when the polling was interrupted and since the interrupts are disabled,
the only thing that could do that would be an SMI or other hardware hiccup
(or an NMI, but those can be tracked).
Note that the hwlat detector should *NEVER* be used in a production environment.
It is intended to be run manually to determine if the hardware platform has a
problem with long system firmware service routines.
Usage:
------
Write the ASCII text "hwlat" into the current_tracer file of the tracing system
(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
redefine the threshold in microseconds (us) above which latency spikes will
be taken into account.
Example:
# echo hwlat > /sys/kernel/tracing/current_tracer
# echo 100 > /sys/kernel/tracing/tracing_thresh
The /sys/kernel/tracing/hwlat_detector interface contains the following files:
width - time period to sample with CPUs held (usecs)
must be less than the total window size (enforced)
window - total period of sampling, width being inside (usecs)
By default the width is set to 500,000 and window to 1,000,000, meaning that
for every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
change to a default of 10 usecs. If any latencies that exceed the threshold is
observed then the data will be written to the tracing ring buffer.
The minimum sleep time between periods is 1 millisecond. Even if width
is less than 1 millisecond apart from window, to allow the system to not
be totally starved.
If tracing_thresh was zero when hwlat detector was started, it will be set
back to zero if another tracer is loaded. Note, the last value in
tracing_thresh that hwlat detector had will be saved and this value will
be restored in tracing_thresh if it is still zero when hwlat detector is
started again.
The following tracing directory files are used by the hwlat_detector:
in /sys/kernel/tracing:
tracing_threshold - minimum latency value to be considered (usecs)
tracing_max_latency - maximum hardware latency actually observed (usecs)
hwlat_detector/width - specified amount of time to spin within window (usecs)
hwlat_detector/window - amount of time between (width) runs (usecs)
Changes since v1:
. Added checks for CONFIG_GENERIC_SCHED_CLOCK, and if that is set,
only nmi_counts will be recorded when an NMI is triggered, but not
the time in NMI, because the generic sched_clock is not NMI safe.
(suggested by Sebastian Andrzej Siewior)
. Changed the format a little to be a little nicer looking.
Was: [99] inner:13 outer:16 ts:1470352534.886878855
Now: #99 inner/outer(us): 13/16 ts:1470352534.886878855
. Included the NMI patch into the main patch set.
. Holding off SMI counting until I have an arch_smi_count() that isn't
so "really yucky".
Jon Masters (1):
tracing: Add documentation for hwlat_detector tracer
Steven Rostedt (Red Hat) (3):
tracing: Added hardware latency tracer
tracing: Have hwlat trace migrate across tracing_cpumask CPUs
tracing: Add NMI tracing in hwlat detector
----
Documentation/trace/hwlat_detector.txt | 79 ++++
arch/sh/kernel/ftrace.c | 4 +-
include/linux/ftrace_irq.h | 31 +-
kernel/trace/Kconfig | 35 ++
kernel/trace/Makefile | 1 +
kernel/trace/trace.c | 2 +-
kernel/trace/trace.h | 3 +
kernel/trace/trace_entries.h | 27 ++
kernel/trace/trace_hwlat.c | 633 +++++++++++++++++++++++++++++++++
kernel/trace/trace_output.c | 66 ++++
10 files changed, 874 insertions(+), 7 deletions(-)
create mode 100644 Documentation/trace/hwlat_detector.txt
create mode 100644 kernel/trace/trace_hwlat.c