Re: [PATCH 2/2] perf tests: add test for trace output loss

From: Benjamin Peterson
Date: Thu Nov 07 2024 - 18:18:21 EST


On Thu, Nov 7, 2024 at 2:07 PM Howard Chu <howardchu95@xxxxxxxxx> wrote:
>
> Hello,
>
> On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@xxxxxxxxxxx> wrote:
> >
> > Add a test that checks that trace output is not lost to races. This is
> > accomplished by tracing the exit_group syscall of "true" multiple times and
> > checking for correct output.
> >
> > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > trace: avoid garbage when not printing a trace event's arguments") because
> > exit_group triggers the previously buggy printing behavior.
> >
> > Signed-off-by: Benjamin Peterson <benjamin@xxxxxxxxxxx>
> > ---
> > tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> > 1 file changed, 31 insertions(+)
> > create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> >
> > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > new file mode 100755
> > index 000000000000..8b70324bc5b4
> > --- /dev/null
> > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > @@ -0,0 +1,31 @@
> > +#!/bin/sh
> > +# perf trace exit race
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +# Check that the last events of a perf trace'd subprocess are not
> > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > +# the output contains 100 correct lines.
> > +
> > +# shellcheck source=lib/probe.sh
> > +. "$(dirname $0)"/lib/probe.sh
> > +
> > +skip_if_no_perf_trace || exit 2
> > +
> > +trace_shutdown_race() {
> > + for i in $(seq 100); do
> > + perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > + done
> > + [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
>
> The test failed due to regex mismatched, I think because of this:
>
> 0.000 true/1526046 syscalls:sys_enter_exit_group( )
> 0.000 true/1526212 syscalls:sys_enter_exit_group( )
> 0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)
>
> Ironically the junk buffer problem you fixed last time, thanks for
> finding another printing problem in perf trace :), I'll figure out the
> cause of this.

Thanks, the problem is a parallel issue with trace__fprintf_sys_enter.
I will include a patch for the problem in my spin of this series.

>
> Another thing is this test takes a long time to finish
>
> perf $ time ./perf test 109
> 109: perf trace exit race : FAILED!
>
> real 0m38.762s
> user 0m15.090s
> sys 0m21.794s
>
> Is it really necessary to run it 100 times? To me it seems to be just
> a wrong handling logic of draining samples, will there be coincidence?

Yes, it's racy. In my testing, sometimes the correct output would
appear. However, I will reduce the number of iterations to 10.