Re: BISECTED: perf test -v "Test data symbol" failing on Intel Hybrid systems

From: Ian Rogers
Date: Thu Apr 11 2024 - 14:54:37 EST


On Thu, Apr 11, 2024 at 11:02 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> On Tue, Apr 09, 2024 at 02:28:43PM -0700, Ian Rogers wrote:
> > On Tue, Apr 9, 2024 at 12:12 PM Arnaldo Carvalho de Melo
> > <acme@xxxxxxxxxx> wrote:
> > >
> > > root@x1:~# grep -m1 "model name" /proc/cpuinfo
> > > model name : 13th Gen Intel(R) Core(TM) i7-1365U
> > > root@x1:~# uname -a
> > > Linux x1 6.7.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 27 16:50:39 UTC 2024 x86_64 GNU/Linux
> > > root@x1:~#
> > >
> > > Bisected down to:
> > >
> > > commit 5752c20f3787c9bc9ff9411a70b3d41add85518c
> > > Author: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> > > Date: Thu Jun 15 10:47:00 2023 +0530
> > >
> > > perf mem: Scan all PMUs instead of just core ones
> > >
> > > Scanning only core PMUs is not sufficient on platforms like AMD since
> > > perf mem on AMD uses IBS OP PMU, which is independent of core PMU.
> > > Scan all PMUs instead of just core PMUs. There should be negligible
> > > performance overhead because of scanning all PMUs, so we should be okay.
> > >
> > > Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > Signed-off-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> > >
> > > root@x1:~# perf -v ; perf test -v "Test data symbol"
> > > perf version 6.4.rc3.g5752c20f3787
> > > 111: Test data symbol :
> > > --- start ---
> > > test child forked, pid 522202
> > > Recording workload...
> > > malloc_consolidate(): invalid chunk size
> > > /home/acme/libexec/perf-core/tests/shell/test_data_symbol.sh: line 60: 522208 Aborted (core dumped) perf mem record --all-user -o ${PERF_DATA} -- $TEST_PROGRAM
> > > /home/acme/libexec/perf-core/tests/shell/test_data_symbol.sh: line 62: kill: (522208) - No such process
> > > Cleaning up files...
> > > test child finished with -1
> > > ---- end ----
> > > Test data symbol: FAILED!
> > > root@x1:~# perf -v ; perf test -v "Test data symbol"
> > > perf version 6.8.g63c22868714b
> > > 116: Test data symbol:
> > > --- start ---
> > > test child forked, pid 526540
> > > 954300-954339 l buf1
> > > perf does have symbol 'buf1'
> > > Recording workload...
> > > Waiting for "perf record has started" message
> > > OK
> > > Cleaning up files...
> > > ---- end(-1) ----
> > > 116: Test data symbol : FAILED!
> > > root@x1:~#
> > >
> > > Further details:
> > >
> > > (gdb) run mem record --all-user sleep 1
> > > Starting program: /root/bin/perf mem record --all-user sleep 1
> > >
> > >
> > > This GDB supports auto-downloading debuginfo from the following URLs:
> > > <https://debuginfod.fedoraproject.org/>
> > > Enable debuginfod for this session? (y or [n]) y
> > > Debuginfod has been enabled.
> > > To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x0000000000611860 in perf_mem_events__record_args (rec_argv=0xea8280, argv_nr=0x7fffffffd6bc, rec_tmp=0xebbda0, tmp_nr=0x7fffffffd6c0) at util/mem-events.c:213
> > > 213 s = perf_mem_events__name(j, pmu->name);
> > > (gdb) bt
> > > #0 0x0000000000611860 in perf_mem_events__record_args (rec_argv=0xea8280, argv_nr=0x7fffffffd6bc, rec_tmp=0xebbda0, tmp_nr=0x7fffffffd6c0) at util/mem-events.c:213
> > > #1 0x000000000045c47f in __cmd_record (argc=2, argv=0x7fffffffe420, mem=0x7fffffffda20) at builtin-mem.c:152
> > > #2 0x000000000045d69b in cmd_mem (argc=4, argv=0x7fffffffe420) at builtin-mem.c:514
> > > #3 0x00000000004ffe38 in run_builtin (p=0xe08aa0 <commands+672>, argc=5, argv=0x7fffffffe420) at perf.c:323
> > > #4 0x00000000005000ac in handle_internal_command (argc=5, argv=0x7fffffffe420) at perf.c:377
> > > #5 0x00000000005001fb in run_argv (argcp=0x7fffffffe23c, argv=0x7fffffffe230) at perf.c:421
> > > #6 0x00000000005004e8 in main (argc=5, argv=0x7fffffffe420) at perf.c:537
> > > (gdb) list -5
> > > file: "arch/x86/util/mem-events.c", line number: 208, symbol: "???"
> > > Line number 203 out of range; arch/x86/util/mem-events.c has 93 lines.
> > > (gdb)
>
> > >
> > >
> > > (gdb) list -5
> > > file: "arch/x86/util/mem-events.c", line number: 208, symbol: "???"
> > > Line number 203 out of range; arch/x86/util/mem-events.c has 93 lines.
> > > (gdb) p j
> > > $1 = 0
> > > (gdb) p pmu->name
> > > Cannot access memory at address 0x64ffffff9c
> > > (gdb)
> > >
> > > 183 int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
> > > 184 char **rec_tmp, int *tmp_nr)
> > > 185 {
> > > 186 int i = *argv_nr, k = 0;
> > > 187 struct perf_mem_event *e;
> > > 188 struct perf_pmu *pmu;
> > > 189 char *s;
> > > 190
> > > 191 for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > > 192 e = perf_mem_events__ptr(j);
> > > 193 if (!e->record)
> > > 194 continue;
> > > 195
> > > 196 if (perf_pmus__num_mem_pmus() == 1) {
> > > 197 if (!e->supported) {
> > > 198 pr_err("failed: event '%s' not supported\n",
> > > 199 perf_mem_events__name(j, NULL));
> > > 200 return -1;
> > > 201 }
> > > 202
> > > 203 rec_argv[i++] = "-e";
> > > 204 rec_argv[i++] = perf_mem_events__name(j, NULL);
> > > 205 } else {
> > > 206 if (!e->supported) {
> > > 207 perf_mem_events__print_unsupport_hybrid(e, j);
> > > 208 return -1;
> > > 209 }
> > > 210
> > > 211 while ((pmu = perf_pmus__scan(pmu)) != NULL) {
> > > 212 rec_argv[i++] = "-e";
> > > 213 s = perf_mem_events__name(j, pmu->name);
> > > 214 if (s) {
> > > 215 s = strdup(s);
> >
> > This looks like something that address/memory sanitizers could help with.
>
> I'm I forgetting something?

I think it is just that the shell script pipes the output and so we
don't see it in verbose mode. I'd add a "set -x" to
tools/perf/tests/shell/test_data_symbol.sh so you can run the failing
command without the redirects/pipes/..

Thanks,
Ian

> ⬢[acme@toolbox perf-tools-next]$ rm -rf ~/libexec/perf-core/ ; make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin
>
> root@x1:~# perf test -v 116
> 116: Test data symbol:
> --- start ---
> test child forked, pid 614595
> f87540-f87579 l buf1
> perf does have symbol 'buf1'
> Recording workload...
> Waiting for "perf record has started" message
> OK
> Cleaning up files...
> ---- end(-1) ----
> 116: Test data symbol : FAILED!
> root@x1:~# ldd ~/bin/perf | grep asan
> libasan.so.8 => /lib64/libasan.so.8 (0x00007f8eb7a00000)
> root@x1:~#