Re: [PATCH] perf scripts python arm-cs-trace-disasm.py: Skip disasm if address continuity is broken

From: Ganapatrao Kulkarni
Date: Tue Jul 23 2024 - 11:27:20 EST




On 23-07-2024 06:40 pm, James Clark wrote:


On 22/07/2024 11:02 am, Ganapatrao Kulkarni wrote:

Hi James,

On 19-07-2024 08:09 pm, James Clark wrote:


On 19/07/2024 10:26 am, Ganapatrao Kulkarni wrote:
To generate the instruction tracing, script uses 2 contiguous packets
address range. If there a continuity brake due to discontiguous branch
address, it is required to reset the tracing and start tracing with the
new set of contiguous packets.

Adding change to identify the break and complete the remaining tracing
of current packets and restart tracing from new set of packets, if
continuity is established.


Hi Ganapatrao,

Can you add a before and after example of what's changed to the commit message? It wasn't immediately obvious to me if this is adding missing output, or it was correcting the tail end of the output that was previously wrong.

It is adding tail end of the trace as well avoiding the segfault of the perf application. With out this change the perf segfaults with as below log


./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ../../vmlinux -v $* > dump
objdump: error: the stop address should be after the start address
Traceback (most recent call last):
   File "./scripts/python/arm-cs-trace-disasm.py", line 271, in process_event
     print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
   File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
     for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
     disasm_output = check_output(disasm).decode('utf-8').split('\n')
                     ^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib64/python3.12/subprocess.py", line 571, in run
     raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['objdump', '-d', '-z', '--start-address=0xffff80008125b758', '--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero exit status 1.
Fatal Python error: handler_call_die: problem in Python trace event handler
Python runtime state: initialized

Current thread 0x0000ffffb05054e0 (most recent call first):
   <no Python frame>

Extension modules: perf_trace_context, systemd._journal, systemd._reader, systemd.id128, report._py3report, _dbus_bindings, problem._py3abrt (total: 7)
Aborted (core dumped)


Signed-off-by: Ganapatrao Kulkarni <gankulkarni@xxxxxxxxxxxxxxxxxxxxxx>
---
  tools/perf/scripts/python/arm-cs-trace-disasm.py | 10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..ad10cee2c35e 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -198,6 +198,10 @@ def process_event(param_dict):
          cpu_data[str(cpu) + 'addr'] = addr
          return
+    if (cpu_data.get(str(cpu) + 'ip') == None):
+        cpu_data[str(cpu) + 'ip'] = ip
+

Do you need to write into the global cpu_data here? Doesn't it get overwritten after you load it back into 'prev_ip'

No, the logic is same as holding the addr of previous packet.
Saving the previous packet saved ip in to prev_ip before overwriting with the current packet.

It's not exactly the same logic as holding the addr of the previous sample. For addr, we return on the first None, with your change we now "pretend" that the second one is also the previous one:

  if (cpu_data.get(str(cpu) + 'addr') == None):
    cpu_data[str(cpu) + 'addr'] = addr
    return  <----------------------------sample 0 return

  if (cpu_data.get(str(cpu) + 'ip') == None):
      cpu_data[str(cpu) + 'ip'] = ip <---- sample 1 save but no return

Then for sample 1 'prev_ip' is actually now the 'current' IP:

Yes, it is dummy for first packet. Added anticipating that we wont hit the discontinuity for the first packet itself.

Can this be changed to more intuitive like below?

diff --git a/tools/perf/scripts/python/arm-cs-trace-disasm.py b/tools/perf/scripts/python/arm-cs-trace-disasm.py
index d973c2baed1c..d49f5090059f 100755
--- a/tools/perf/scripts/python/arm-cs-trace-disasm.py
+++ b/tools/perf/scripts/python/arm-cs-trace-disasm.py
@@ -198,6 +198,8 @@ def process_event(param_dict):
cpu_data[str(cpu) + 'addr'] = addr
return

+ if (cpu_data.get(str(cpu) + 'ip') != None):
+ prev_ip = cpu_data[str(cpu) + 'ip']

if (options.verbose == True):
print("Event type: %s" % name)
@@ -243,12 +245,18 @@ def process_event(param_dict):

# Record for previous sample packet
cpu_data[str(cpu) + 'addr'] = addr
+ cpu_data[str(cpu) + 'ip'] = stop_addr

# Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
if (start_addr == 0 and stop_addr == 4):
print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
return

+ if (stop_addr < start_addr and prev_ip != 0):
+ # Continuity of the Packets broken, set start_addr to previous
+ # packet ip to complete the remaining tracing of the address range.
+ start_addr = prev_ip
+
if (start_addr < int(dso_start) or start_addr > int(dso_end)):
print("Start address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
return

Without this patch below is the failure log(with segfault) for reference.

[root@sut01sys-r214 perf]# timeout 4s ./perf record -e cs_etm// -C 1 dd if=/dev/zero of=/dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.087 MB perf.data ]
[root@sut01sys-r214 perf]# ./perf script --script=python:./scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ../../vmlinux -v $* > dump
objdump: error: the stop address should be after the start address
Traceback (most recent call last):
File "./scripts/python/arm-cs-trace-disasm.py", line 271, in process_event
print_disam(dso_fname, dso_vm_start, start_addr, stop_addr)
File "./scripts/python/arm-cs-trace-disasm.py", line 105, in print_disam
for line in read_disam(dso_fname, dso_start, start_addr, stop_addr):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./scripts/python/arm-cs-trace-disasm.py", line 99, in read_disam
disasm_output = check_output(disasm).decode('utf-8').split('\n')
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 466, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['objdump', '-d', '-z', '--start-address=0xffff80008125b758', '--stop-address=0xffff80008125a934', '../../vmlinux']' returned non-zero exit status 1.
Fatal Python error: handler_call_die: problem in Python trace event handler
Python runtime state: initialized

Current thread 0x0000ffffb90d54e0 (most recent call first):
<no Python frame>

Extension modules: perf_trace_context, systemd._journal, systemd._reader, systemd.id128, report._py3report, _dbus_bindings, problem._py3abrt (total: 7)
Aborted (core dumped)


dump snippet:
============
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr: 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040 __perf_event_header__init_id+0x54 .../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr: 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp, #-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040 local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr: 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp, #-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040 sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns = sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr: 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp, #-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20, ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28 // #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040 sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr: 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }


With fix:
=========

Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb00 phys_addr: 0x0000000000000000 ip: 0xffff800080313f0c pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff800080313f04 <__perf_event_header__init_id+0x4c>:
ffff800080313f04: 36100094 tbz w20, #2, ffff800080313f14 <__perf_event_header__init_id+0x5c>
ffff800080313f08: f941e6a0 ldr x0, [x21, #968]
ffff800080313f0c: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040 __perf_event_header__init_id+0x54 .../coresight/linux/kernel/events/core.c 586 return event->clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4a8 phys_addr: 0x0000000000000000 ip: 0xffff80008030cb0c pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff80008030cb00 <local_clock>:
ffff80008030cb00: d503233f paciasp
ffff80008030cb04: a9bf7bfd stp x29, x30, [sp, #-16]!
ffff80008030cb08: 910003fd mov x29, sp
ffff80008030cb0c: 97faba67 bl ffff8000801bb4a8 <sched_clock>
perf 12720/12720 [0001] 5986.372298040 local_clock+0xc ...t/linux/./include/linux/sched/clock.h 64 return sched_clock();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125a8a8 phys_addr: 0x0000000000000000 ip: 0xffff8000801bb4c8 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff8000801bb4a8 <sched_clock>:
ffff8000801bb4a8: d503233f paciasp
ffff8000801bb4ac: a9be7bfd stp x29, x30, [sp, #-32]!
ffff8000801bb4b0: 910003fd mov x29, sp
ffff8000801bb4b4: a90153f3 stp x19, x20, [sp, #16]
ffff8000801bb4b8: d5384113 mrs x19, sp_el0
ffff8000801bb4bc: b9401260 ldr w0, [x19, #16]
ffff8000801bb4c0: 11000400 add w0, w0, #0x1
ffff8000801bb4c4: b9001260 str w0, [x19, #16]
ffff8000801bb4c8: 94427cf8 bl ffff80008125a8a8 <sched_clock_noinstr>
perf 12720/12720 [0001] 5986.372298040 sched_clock+0x20 ...sight/linux/kernel/time/sched_clock.c 105 ns = sched_clock_noinstr();
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008125b758 phys_addr: 0x0000000000000000 ip: 0xffff80008125a8e4 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff80008125a8a8 <sched_clock_noinstr>:
ffff80008125a8a8: d503233f paciasp
ffff80008125a8ac: a9bc7bfd stp x29, x30, [sp, #-64]!
ffff80008125a8b0: 910003fd mov x29, sp
ffff80008125a8b4: a90153f3 stp x19, x20, [sp, #16]
ffff80008125a8b8: b000e354 adrp x20, ffff800082ec3000 <tick_bc_dev+0x140>
ffff80008125a8bc: 910d0294 add x20, x20, #0x340
ffff80008125a8c0: a90363f7 stp x23, x24, [sp, #48]
ffff80008125a8c4: 91002297 add x23, x20, #0x8
ffff80008125a8c8: 52800518 mov w24, #0x28 // #40
ffff80008125a8cc: a9025bf5 stp x21, x22, [sp, #32]
ffff80008125a8d0: b9400296 ldr w22, [x20]
ffff80008125a8d4: 120002d5 and w21, w22, #0x1
ffff80008125a8d8: 9bb87eb5 umull x21, w21, w24
ffff80008125a8dc: 8b1502f3 add x19, x23, x21
ffff80008125a8e0: f9400e60 ldr x0, [x19, #24]
ffff80008125a8e4: d63f0000 blr x0
perf 12720/12720 [0001] 5986.372298040 sched_clock_noinstr+0x3c ...sight/linux/kernel/time/sched_clock.c 93 cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4cc phys_addr: 0x0000000000000000 ip: 0xffff80008125a930 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff80008125a8e8 <sched_clock_noinstr+0x40>:
ffff80008125a8e8: f8756ae3 ldr x3, [x23, x21]
ffff80008125a8ec: a9409666 ldp x6, x5, [x19, #8]
ffff80008125a8f0: 29441261 ldp w1, w4, [x19, #32]
ffff80008125a8f4: d50339bf dmb ishld
ffff80008125a8f8: b9400282 ldr w2, [x20]
ffff80008125a8fc: 6b16005f cmp w2, w22
ffff80008125a900: 54fffe81 b.ne ffff80008125a8d0 <sched_clock_noinstr+0x28> // b.any
ffff80008125a904: cb060000 sub x0, x0, x6
ffff80008125a908: 2a0103e1 mov w1, w1
ffff80008125a90c: 8a050000 and x0, x0, x5
ffff80008125a910: a94153f3 ldp x19, x20, [sp, #16]
ffff80008125a914: 9b017c00 mul x0, x0, x1
ffff80008125a918: a9425bf5 ldp x21, x22, [sp, #32]
ffff80008125a91c: a94363f7 ldp x23, x24, [sp, #48]
ffff80008125a920: 9ac42400 lsr x0, x0, x4
ffff80008125a924: a8c47bfd ldp x29, x30, [sp], #64
ffff80008125a928: d50323bf autiasp
ffff80008125a92c: 8b030000 add x0, x0, x3
ffff80008125a930: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040 sched_clock_noinstr+0x88 ...sight/linux/kernel/time/sched_clock.c 99 }
Event type: branches
Sample = { cpu: 0001 addr: 0xffff8000801bb4ec phys_addr: 0x0000000000000000 ip: 0xffff8000801bb4e4 pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff8000801bb4cc <sched_clock+0x24>:
ffff8000801bb4cc: aa0003f4 mov x20, x0
ffff8000801bb4d0: f9400a61 ldr x1, [x19, #16]
ffff8000801bb4d4: d1000421 sub x1, x1, #0x1
ffff8000801bb4d8: b9001261 str w1, [x19, #16]
ffff8000801bb4dc: b4000061 cbz x1, ffff8000801bb4e8 <sched_clock+0x40>
ffff8000801bb4e0: f9400a60 ldr x0, [x19, #16]
ffff8000801bb4e4: b5000040 cbnz x0, ffff8000801bb4ec <sched_clock+0x44>
perf 12720/12720 [0001] 5986.372298040 sched_clock+0x3c ...ux/./arch/arm64/include/asm/preempt.h 74 return !pc || !READ_ONCE(ti->preempt_count);
Event type: branches
Sample = { cpu: 0001 addr: 0xffff80008030cb10 phys_addr: 0x0000000000000000 ip: 0xffff8000801bb4fc pid: 12720 tid: 12720 period: 1 time: 5986372298040 }
ffff8000801bb4ec <sched_clock+0x44>:
ffff8000801bb4ec: aa1403e0 mov x0, x20
ffff8000801bb4f0: a94153f3 ldp x19, x20, [sp, #16]
ffff8000801bb4f4: a8c27bfd ldp x29, x30, [sp], #32
ffff8000801bb4f8: d50323bf autiasp
ffff8000801bb4fc: d65f03c0 ret
perf 12720/12720 [0001] 5986.372298040 sched_clock+0x54 ...sight/linux/kernel/time/sched_clock.c 108 }

Still we miss tracing of 0xffff80008125b758, however seg-fault is avoided.

  prev_ip = cpu_data[str(cpu) + 'ip']

This means that prev_ip is sometimes the previous sample's IP only sometimes (samples following 1), otherwise it's the current IP. Does your fix actually require this bit? Because we already save the 'real' previous one:

  cpu_data[str(cpu) + 'ip'] = stop_addr

Also normally we save ip + 4 (stop_addr), where as you save ip. It's not clear why there is no need to add the 4?



   prev_ip = cpu_data[str(cpu) + 'ip']

   ... then ...

   # Record for previous sample packet
   cpu_data[str(cpu) + 'addr'] = addr
   cpu_data[str(cpu) + 'ip'] = stop_addr

Would a local variable not accomplish the same thing?

No, We need global to hold the ip of previous packet.

+    prev_ip = cpu_data[str(cpu) + 'ip']
      if (options.verbose == True):
          print("Event type: %s" % name)
@@ -243,12 +247,18 @@ def process_event(param_dict):
      # Record for previous sample packet
      cpu_data[str(cpu) + 'addr'] = addr
+    cpu_data[str(cpu) + 'ip'] = stop_addr
      # Handle CS_ETM_TRACE_ON packet if start_addr=0 and stop_addr=4
      if (start_addr == 0 and stop_addr == 4):
          print("CPU%d: CS_ETM_TRACE_ON packet is inserted" % cpu)
          return
+    if (stop_addr < start_addr):
+        # Continuity of the Packets broken, set start_addr to previous
+        # packet ip to complete the remaining tracing of the address range.

After looking a bit more I'm also not sure why stop_addr < start_addr signifies a discontinuity. What if the discontinuity ends up with stop_addr > start_addr? There's no reason it can't jump forwards as well as backwards.

Can you share the 3 samples from the --verbose output to the script that cause the issue?

I see discontinuities as having the branch source (ip) set to 0 which is what we do at the start:

   Sample = { cpu: 0000 addr: 0x0000ffffb807adac phys_addr: 0x0000000000000000 ip: 0x0000000000000000 pid: 28388 }

Then the ending one has the branch target (addr) set to 0:

  Sample = { cpu: 0000 addr: 0x0000000000000000 phys_addr: 0x0000000000000000 ip: 0x0000ffffb7eee168 pid: 28388 }


And it doesn't hit objdump because of the range check:

 Start address 0x0 is out of range ...

So I don't see any missing disassembly or crashes for this.

+        start_addr = prev_ip
+
      if (start_addr < int(dso_start) or start_addr > int(dso_end)):
          print("Start address 0x%x is out of range [ 0x%x .. 0x%x ] for dso %s" % (start_addr, int(dso_start), int(dso_end), dso))
          return

Thanks,
Ganapat

Thanks,
Ganapat