Re: [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads
From: Ian Rogers
Date: Mon Jun 22 2026 - 19:59:40 EST
On Thu, Jun 18, 2026 at 6:25 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> On Wed, Jun 17, 2026 at 03:37:06PM -0700, Namhyung Kim wrote:
> > On Tue, Jun 16, 2026 at 09:48:09AM -0700, Ian Rogers wrote:
> > > Introduce `perf_record_with_retry` and `perf_record_cleanup` in a shared
> > > library `tests/shell/lib/perf_record.sh` to prevent record test failures
> > > caused by transient recording or workload delays.
> > >
> > > Update `record.sh`, `record_lbr.sh`, `pipe_test.sh`, `kvm.sh`, and
> > > `stat_all_pfm.sh` to use this robust record retry logic. These tests now
> > > start with very short durations (e.g. 0.01 seconds) and scale up if the
> > > initial recording failed to capture samples, significantly improving test
> > > execution speed on success while remaining resilient to slow systems.
> > >
> > > Assisted-by: Antigravity:gemini-3.1-pro
> > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > ---
> > [SNIP]
> > > @@ -60,22 +71,29 @@ test_kvm_stat() {
> > > test_kvm_record_report() {
> > > echo "Testing perf kvm record/report"
> > >
> > > - echo "Recording kvm profile for pid ${qemu_pid}..."
> > > - # Use --host to avoid needing guest symbols/mounts for this simple test
> > > - # We just want to verify the command runs and produces data
> > > - # We run in background and kill it because 'perf kvm record' appends options
> > > - # after the command, which breaks 'sleep' (e.g. it gets '-e cycles').
> > > - perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" &
> > > - rec_pid=$!
> > > - sleep 1
> > > - kill -INT "${rec_pid}"
> > > - wait "${rec_pid}" || true
> > > + local duration
> > > + local success=false
> > > + for duration in 1 2 4 8; do
> > > + echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..."
> > > + rm -f "${perfdata}" "${perfdata}".old
> > > +
> > > + perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" &
> > > + local rec_pid=$!
> > > + sleep ${duration}
> > > + kill -INT "${rec_pid}"
> > > + wait "${rec_pid}" || true
> >
> > Can this be just like below?
> >
> > perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" sleep ${duration}
>
> Right, looks equivalent and simpler,
>
> I was making notes to address the flakiness that is making me use the
> end summary of entries that fail (great feature) to run them in
> isolation to check if they work that way, which most do :-\
Here is an explanation from antigravity (sorry I have some typing
issues atm), but it comes down to how the command line is built up:
Normally, we could just write:
perf kvm --host record -p "qemuₚid" - o"{perfdata}" sleep
${duration}
However, on Intel x86 architectures, if no -e option is supplied
to perf kvm record , it calls architecture-specific setup logic (
__kvm_add_default_arch_event_x86() ) which automatically appends "-e"
and "cycles" to the end of the argv array.
Because perf record option parsing stops at the first non-option
(which is "sleep" ), any arguments appended after the workload name
are forwarded directly to the workload command. This turns the
executed command into:
sleep 1 -e cycles
Which causes sleep to crash with:
sleep: invalid option -- 'e'
I'll revise this in v4 to be clearer, but we should probably also fix
the option parsing issue.
Thanks,
Ian
> - Arnaldo
>
> >
> > Thanks,
> > Namhyung
> >
> > > +
> > > + if [ -e "${perfdata}" ] && \
> > > + perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then
> > > + success=true
> > > + break
> > > + fi
> > > + echo "No samples or report failed, retrying..."
> > > + done
> > >
> > > - echo "Reporting kvm profile..."
> > > - # Check for some standard output from report
> > > - if ! perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then
> > > + if [ "$success" = false ]; then
> > > echo "Failed to report kvm profile"
> > > - perf kvm -i "${perfdata}" report --stdio 2>&1
> > > + perf kvm -i "${perfdata}" report --stdio 2>&1 || true
> > > err=1
> > > return
> > > fi