Re: BISECTED: perf test -v "Test data symbol" failing on Intel Hybrid systems

From: Arnaldo Carvalho de Melo
Date: Thu Apr 11 2024 - 14:31:42 EST


On Tue, Apr 09, 2024 at 02:28:43PM -0700, Ian Rogers wrote:
> On Tue, Apr 9, 2024 at 12:12 PM Arnaldo Carvalho de Melo
> <acme@xxxxxxxxxx> wrote:
> >
> > root@x1:~# grep -m1 "model name" /proc/cpuinfo
> > model name : 13th Gen Intel(R) Core(TM) i7-1365U
> > root@x1:~# uname -a
> > Linux x1 6.7.11-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 27 16:50:39 UTC 2024 x86_64 GNU/Linux
> > root@x1:~#
> >
> > Bisected down to:
> >
> > commit 5752c20f3787c9bc9ff9411a70b3d41add85518c
> > Author: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> > Date: Thu Jun 15 10:47:00 2023 +0530
> >
> > perf mem: Scan all PMUs instead of just core ones
> >
> > Scanning only core PMUs is not sufficient on platforms like AMD since
> > perf mem on AMD uses IBS OP PMU, which is independent of core PMU.
> > Scan all PMUs instead of just core PMUs. There should be negligible
> > performance overhead because of scanning all PMUs, so we should be okay.
> >
> > Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx>
> > Signed-off-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> >
> > root@x1:~# perf -v ; perf test -v "Test data symbol"
> > perf version 6.4.rc3.g5752c20f3787
> > 111: Test data symbol :
> > --- start ---
> > test child forked, pid 522202
> > Recording workload...
> > malloc_consolidate(): invalid chunk size
> > /home/acme/libexec/perf-core/tests/shell/test_data_symbol.sh: line 60: 522208 Aborted (core dumped) perf mem record --all-user -o ${PERF_DATA} -- $TEST_PROGRAM
> > /home/acme/libexec/perf-core/tests/shell/test_data_symbol.sh: line 62: kill: (522208) - No such process
> > Cleaning up files...
> > test child finished with -1
> > ---- end ----
> > Test data symbol: FAILED!
> > root@x1:~# perf -v ; perf test -v "Test data symbol"
> > perf version 6.8.g63c22868714b
> > 116: Test data symbol:
> > --- start ---
> > test child forked, pid 526540
> > 954300-954339 l buf1
> > perf does have symbol 'buf1'
> > Recording workload...
> > Waiting for "perf record has started" message
> > OK
> > Cleaning up files...
> > ---- end(-1) ----
> > 116: Test data symbol : FAILED!
> > root@x1:~#
> >
> > Further details:
> >
> > (gdb) run mem record --all-user sleep 1
> > Starting program: /root/bin/perf mem record --all-user sleep 1
> >
> >
> > This GDB supports auto-downloading debuginfo from the following URLs:
> > <https://debuginfod.fedoraproject.org/>
> > Enable debuginfod for this session? (y or [n]) y
> > Debuginfod has been enabled.
> > To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0000000000611860 in perf_mem_events__record_args (rec_argv=0xea8280, argv_nr=0x7fffffffd6bc, rec_tmp=0xebbda0, tmp_nr=0x7fffffffd6c0) at util/mem-events.c:213
> > 213 s = perf_mem_events__name(j, pmu->name);
> > (gdb) bt
> > #0 0x0000000000611860 in perf_mem_events__record_args (rec_argv=0xea8280, argv_nr=0x7fffffffd6bc, rec_tmp=0xebbda0, tmp_nr=0x7fffffffd6c0) at util/mem-events.c:213
> > #1 0x000000000045c47f in __cmd_record (argc=2, argv=0x7fffffffe420, mem=0x7fffffffda20) at builtin-mem.c:152
> > #2 0x000000000045d69b in cmd_mem (argc=4, argv=0x7fffffffe420) at builtin-mem.c:514
> > #3 0x00000000004ffe38 in run_builtin (p=0xe08aa0 <commands+672>, argc=5, argv=0x7fffffffe420) at perf.c:323
> > #4 0x00000000005000ac in handle_internal_command (argc=5, argv=0x7fffffffe420) at perf.c:377
> > #5 0x00000000005001fb in run_argv (argcp=0x7fffffffe23c, argv=0x7fffffffe230) at perf.c:421
> > #6 0x00000000005004e8 in main (argc=5, argv=0x7fffffffe420) at perf.c:537
> > (gdb) list -5
> > file: "arch/x86/util/mem-events.c", line number: 208, symbol: "???"
> > Line number 203 out of range; arch/x86/util/mem-events.c has 93 lines.
> > (gdb)

> >
> >
> > (gdb) list -5
> > file: "arch/x86/util/mem-events.c", line number: 208, symbol: "???"
> > Line number 203 out of range; arch/x86/util/mem-events.c has 93 lines.
> > (gdb) p j
> > $1 = 0
> > (gdb) p pmu->name
> > Cannot access memory at address 0x64ffffff9c
> > (gdb)
> >
> > 183 int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
> > 184 char **rec_tmp, int *tmp_nr)
> > 185 {
> > 186 int i = *argv_nr, k = 0;
> > 187 struct perf_mem_event *e;
> > 188 struct perf_pmu *pmu;
> > 189 char *s;
> > 190
> > 191 for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > 192 e = perf_mem_events__ptr(j);
> > 193 if (!e->record)
> > 194 continue;
> > 195
> > 196 if (perf_pmus__num_mem_pmus() == 1) {
> > 197 if (!e->supported) {
> > 198 pr_err("failed: event '%s' not supported\n",
> > 199 perf_mem_events__name(j, NULL));
> > 200 return -1;
> > 201 }
> > 202
> > 203 rec_argv[i++] = "-e";
> > 204 rec_argv[i++] = perf_mem_events__name(j, NULL);
> > 205 } else {
> > 206 if (!e->supported) {
> > 207 perf_mem_events__print_unsupport_hybrid(e, j);
> > 208 return -1;
> > 209 }
> > 210
> > 211 while ((pmu = perf_pmus__scan(pmu)) != NULL) {
> > 212 rec_argv[i++] = "-e";
> > 213 s = perf_mem_events__name(j, pmu->name);
> > 214 if (s) {
> > 215 s = strdup(s);
>
> This looks like something that address/memory sanitizers could help with.

I'm I forgetting something?

⬢[acme@toolbox perf-tools-next]$ rm -rf ~/libexec/perf-core/ ; make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

root@x1:~# perf test -v 116
116: Test data symbol:
--- start ---
test child forked, pid 614595
f87540-f87579 l buf1
perf does have symbol 'buf1'
Recording workload...
Waiting for "perf record has started" message
OK
Cleaning up files...
---- end(-1) ----
116: Test data symbol : FAILED!
root@x1:~# ldd ~/bin/perf | grep asan
libasan.so.8 => /lib64/libasan.so.8 (0x00007f8eb7a00000)
root@x1:~#