Re: [PATCH v1 01/13] perf/core: add union to struct perf_branch_entry

From: Madhavan Srinivasan
Date: Fri Sep 17 2021 - 02:38:25 EST



On 9/15/21 11:33 AM, Stephane Eranian wrote:
Michael,


On Fri, Sep 10, 2021 at 7:16 AM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
Michael Ellerman <mpe@xxxxxxxxxxxxxx> writes:
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
On Thu, Sep 09, 2021 at 12:56:48AM -0700, Stephane Eranian wrote:
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f92880a15645..eb11f383f4be 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1329,13 +1329,18 @@ union perf_mem_data_src {
struct perf_branch_entry {
__u64 from;
__u64 to;
- __u64 mispred:1, /* target mispredicted */
- predicted:1,/* target predicted */
- in_tx:1, /* in transaction */
- abort:1, /* transaction abort */
- cycles:16, /* cycle count to last branch */
- type:4, /* branch type */
- reserved:40;
+ union {
+ __u64 val; /* to make it easier to clear all fields */
+ struct {
+ __u64 mispred:1, /* target mispredicted */
+ predicted:1,/* target predicted */
+ in_tx:1, /* in transaction */
+ abort:1, /* transaction abort */
+ cycles:16, /* cycle count to last branch */
+ type:4, /* branch type */
+ reserved:40;
+ };
+ };
};

Hurpmh... all other bitfields have ENDIAN_BITFIELD things except this
one. Power folks, could you please have a look?
The bit number of each field changes between big and little endian, but
as long as kernel and userspace are the same endian, and both only
access values via the bitfields then it works.
...
It does look like we have a bug in perf tool though, if I take a
perf.data from a big endian system to a little endian one I don't see
any of the branch flags decoded. eg:

BE:

2413132652524 0x1db8 [0x2d0]: PERF_RECORD_SAMPLE(IP, 0x1): 5279/5279: 0xc00000000045c028 period: 923003 addr: 0
... branch stack: nr:28
..... 0: c00000000045c028 -> c00000000dce7604 0 cycles P 0

LE:

2413132652524 0x1db8 [0x2d0]: PERF_RECORD_SAMPLE(IP, 0x1): 5279/5279: 0xc00000000045c028 period: 923003 addr: 0
... branch stack: nr:28
..... 0: c00000000045c028 -> c00000000dce7604 0 cycles 0
^
missing P

I guess we're missing a byte swap somewhere.
Ugh. We _do_ have a byte swap, but we also need a bit swap.

That works for the single bit fields, not sure if it will for the
multi-bit fields.

So that's a bit of a mess :/

Based on what I see in perf_event.h for other structures, I think I
can make up what you would need for struct branch_entry. But Iit would
be easier if you could send me a patch that you would have verified on
your systems.
Thanks.
Attached patch fixes the issue. Have tested both in both in BE and LE case.

Maddy

From f816ba2e6ef8d5975f78442d7ecb50d66c3c4326 Mon Sep 17 00:00:00 2001
From: Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>
Date: Wed, 15 Sep 2021 22:29:09 +0530
Subject: [RFC PATCH] tools/perf: Add reverse_64b macro

branch_stack struct has bit field definition
producing different bit ordering for big/little endian.
Because of this, when branch_stack sample collected
in a BE system viewed/reported in a LE system,
bit fields of the branch stack are not presented
properly. To address this issue, a reverse_64b
macro is defined and introduced in evsel__parse_sample.

Signed-off-by: Madhavan Srinivasan <maddy@xxxxxxxxxxxxx>
---
 tools/perf/util/evsel.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dbfeceb2546c..3151606e516e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2221,6 +2221,9 @@ void __weak arch_perf_parse_sample_weight(struct perf_sample *data,
     data->weight = *array;
 }

+#define reverse_64b(src, pos, size)    \
+    (((src >> pos) & (( 1ull <<size) - 1)) << (63 - (pos + size - 1)))
+
 int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
             struct perf_sample *data)
 {
@@ -2408,6 +2411,8 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
     if (type & PERF_SAMPLE_BRANCH_STACK) {
         const u64 max_branch_nr = UINT64_MAX /
                       sizeof(struct branch_entry);
+        struct branch_entry *e;
+        unsigned i;

         OVERFLOW_CHECK_u64(array);
         data->branch_stack = (struct branch_stack *)array++;
@@ -2416,10 +2421,36 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
             return -EFAULT;

         sz = data->branch_stack->nr * sizeof(struct branch_entry);
-        if (evsel__has_branch_hw_idx(evsel))
+        if (evsel__has_branch_hw_idx(evsel)) {
             sz += sizeof(u64);
-        else
+            e = &data->branch_stack->entries[0];
+        } else {
             data->no_hw_idx = true;
+            e = (struct branch_entry *)&data->branch_stack->hw_idx;
+        }
+
+        if (swapped) {
+            for (i = 0; i < data->branch_stack->nr; i++, e++) {
+                u64 new_val = 0;
+
+                /* mispred:1  target mispredicted */
+                new_val = reverse_64b(e->flags.value, 0, 1);
+                /* predicted:1  target predicted */
+                new_val |= reverse_64b(e->flags.value, 1, 1);
+                /* in_tx:1  in transaction */
+                new_val |= reverse_64b(e->flags.value, 2, 1);
+                /* abort:1  transaction abort */
+                new_val |= reverse_64b(e->flags.value, 3, 1);
+                /* cycles:16  cycle count to last branch */
+                new_val |= reverse_64b(e->flags.value, 4, 16);
+                /* type:4  branch type */
+                new_val |= reverse_64b(e->flags.value, 20, 4);
+                /* reserved:40 */
+                new_val |= reverse_64b(e->flags.value, 24, 40);
+                e->flags.value = new_val;
+            }
+        }
+
         OVERFLOW_CHECK(array, sz, max_size);
         array = (void *)array + sz;
     }
--
2.31.1