[RFC PATCH 0/1] Add kselftest to detect boot event slowdowns

From: Laura Nao
Date: Thu Jul 25 2024 - 07:06:18 EST


Add a new kselftest to detect and report slowdowns in key boot events. The
test uses ftrace to track timings for specific boot events and compares
these timestamps against reference values provided in YAML format.

The test includes the following files:

- `bootconfig` file: configures ftrace and lists reference key boot
events.
- `config` fragment: enables boot time tracing and attaches the
bootconfig file to the kernel image.
- `kprobe_timestamps_to_yaml.py` script: parses the current trace file to
extract event names and timestamps and writes them to a YAML file. The
script is intended to be run once to generate initial reference values;
the generated file is not meant to be stored in the kernel sources but
should be provided as input to the test itself. YAML format was chosen
to allow easy integration with per-platform data used in other tests,
such as the discoverable devices probe test in
tools/testing/selftests/devices. Another option is to use JSON, as the
file is not intended for manual editing and JSON is already supported
by the Python standard library.
- `test_boot_time.py` script: parses the current trace file and compares
timestamps against the values in the YAML file provided as input.
Reports a failure if any timestamp differs from the reference value by
more than the specified delta.
- `trace_utils.py` file: utility functions to mount debugfs and parse the
trace file to extract relevant information.

The bootconfig file provided is an initial draft with some reference kprobe
events to showcase how the test works. I would appreciate feedback from
those interested in running this test on which boot events should be added.
Different key events might be relevant depending on the platform and its
boot time requirements. This file should serve as a common ground and be
populated with critical events and functions common to different platforms.

Feedback on the overall approach of this test and suggestions for
additional boot events to trace would be greatly appreciated.

Example output with a deliberately small delta of 0.01 to demonstrate failures:

TAP version 13
1..4
ok 1 populate_rootfs_begin
# 'run_init_process_begin' differs by 0.033990 seconds.
not ok 2 run_init_process_begin
# 'run_init_process_end' differs by 0.033796 seconds.
not ok 3 run_init_process_end
ok 4 unpack_to_rootfs_begin
# Totals: pass:2 fail:2 xfail:0 xpass:0 skip:0 error:0

This patch depends on "kselftest: Move ksft helper module to common
directory":
https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1@xxxxxxxxxxxxx/
which was picked through the usb tree and is queued for 6.11-rc1.

Best,

Laura

Laura Nao (1):
kselftests: Add test to detect boot event slowdowns

tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/boot-time/Makefile | 17 ++++
tools/testing/selftests/boot-time/bootconfig | 8 ++
tools/testing/selftests/boot-time/config | 4 +
.../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++
.../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++
.../selftests/boot-time/trace_utils.py | 63 +++++++++++++
7 files changed, 242 insertions(+)
create mode 100644 tools/testing/selftests/boot-time/Makefile
create mode 100644 tools/testing/selftests/boot-time/bootconfig
create mode 100644 tools/testing/selftests/boot-time/config
create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
create mode 100644 tools/testing/selftests/boot-time/trace_utils.py

--
2.30.2