Re: [PATCH v2 11/11] perf python tracepoint: Switch to using parse_events

From: Arnaldo Carvalho de Melo
Date: Tue Mar 11 2025 - 17:52:59 EST


On Tue, Mar 11, 2025 at 03:49:37PM -0300, Arnaldo Carvalho de Melo wrote:
> So it seems to be something just in the python binding, as perf trace
> seems to handle it well:
>
> ( field 'prev_comm' ret=0x7f7c31f65110, raw_size=68 ) ( field 'prev_pid' ret=0x7f7c23b1bed0, raw_size=68 ) ( field 'prev_prio' ret=0x7f7c239c0030, raw_size=68 ) ( field 'prev_state' ret=0x7f7c239c0250, raw_size=68 ) time 14771421785867 prev_comm= prev_pid=1919907691 prev_prio=796026219 prev_state=0x303a32313175 ==>
> ( XXX '��' len=16, raw_size=68) ( field 'next_comm' ret=(nil), raw_size=68 ) Traceback (most recent call last):
> File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 51, in <module>
> main()
> File "/home/acme/git/perf-tools-next/tools/perf/python/tracepoint.py", line 46, in main
> event.next_comm,
> ^^^^^^^^^^^^^^^
> AttributeError: 'perf.sample_event' object has no attribute 'next_comm'
> root@number:/home/acme/git/perf-tools-next# cat /proc/125355/comm
> kworker/u112:0-i915
> root@number:/home/acme/git/perf-tools-next#
> root@number:/home/acme/git/perf-tools-next#
> root@number:/home/acme/git/perf-tools-next# perf trace -e sched:sched_switch -p 125355
> 0.000 sched:sched_switch(prev_comm: "kworker/u112:0", prev_pid: 125355 (kworker/u112:0-), prev_prio: 120, prev_state: 128, next_comm: "swapper/6", next_prio: 120)
> ^Croot@number:/home/acme/git/perf-tools-next#
>
> I.e. it chops up the prev_comm size to what is specified in the
> tracepoint format.
>
> And that sample->raw_size is the same accross the sched:sched_switch
> raw_datas (seemingly suboptimal, most are less than 16 bytes, but
> probably its not guaranteed that the \0 will be there, so copy all the
> 16 bytes).
>
> Now to try to figure out why simply using PyUnicode_FromStringAndSize
> doesn't work...

Didn't manage to make progress on this, I spent more time than I
expected as I think this could be some sort of canary on some coal mine,
but with the patch below, that gives up and just avoids touching the
COMM fields and don't switch from string to bytearray in the binding, it
runs forever, this is just a data point in case somebody wants to
pursue.

That flipping from string to not string based on just one entry not
being acceptable is questionable, and I think it should go away, but why
when COMM fields are bigger what is alloted to them in the tracepoint
ends up tripping up just the python binding is something I couldn't
grasp in today's session.

Namhyung, this is something open, but not caused by Ian's patchset, for
which I give my:

Tested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>

In addition to the tags I provided patch by patch.

- Arnaldo

diff --git a/tools/perf/python/tracepoint.py b/tools/perf/python/tracepoint.py
index 38b2b6d11f64566a..965b50afbdafeeb2 100755
--- a/tools/perf/python/tracepoint.py
+++ b/tools/perf/python/tracepoint.py
@@ -33,15 +33,12 @@ def main():
if not isinstance(event, perf.sample_event):
continue

- print("time %u prev_comm=%s prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_comm=%s next_pid=%d next_prio=%d" % (
- event.sample_time,
- event.prev_comm,
- event.prev_pid,
- event.prev_prio,
- event.prev_state,
- event.next_comm,
- event.next_pid,
- event.next_prio))
+ try:
+ print("time %u prev_comm=%s prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_comm=%s next_pid=%d next_prio=%d" % (
+ event.sample_time, event.prev_comm, event.prev_pid, event.prev_prio, event.prev_state, event.next_comm, event.next_pid, event.next_prio))
+ except:
+ print("time %u prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_pid=%d next_prio=%d" % (
+ event.sample_time, event.prev_pid, event.prev_prio, event.prev_state, event.next_pid, event.next_prio))

if __name__ == '__main__':
main()
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 6c5bb5e8893998ae..3eb77bd270077cb3 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -318,13 +318,10 @@ tracepoint_field(const struct pyrf_event *pe, struct tep_format_field *field)
if (tep_field_is_relative(field->flags))
offset += field->offset + field->size;
}
- if (field->flags & TEP_FIELD_IS_STRING &&
- is_printable_array(data + offset, len)) {
+ if (field->flags & TEP_FIELD_IS_STRING)
ret = PyUnicode_FromString((char *)data + offset);
- } else {
+ else
ret = PyByteArray_FromStringAndSize((const char *) data + offset, len);
- field->flags &= ~TEP_FIELD_IS_STRING;
- }
} else {
val = tep_read_number(pevent, data + field->offset,
field->size);