Re: [PATCH v1 01/13] perf/core: add union to struct perf_branch_entry
From: Michael Ellerman
Date: Fri Sep 10 2021 - 08:09:31 EST
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> On Thu, Sep 09, 2021 at 12:56:48AM -0700, Stephane Eranian wrote:
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index f92880a15645..eb11f383f4be 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -1329,13 +1329,18 @@ union perf_mem_data_src {
>> struct perf_branch_entry {
>> __u64 from;
>> __u64 to;
>> - __u64 mispred:1, /* target mispredicted */
>> - predicted:1,/* target predicted */
>> - in_tx:1, /* in transaction */
>> - abort:1, /* transaction abort */
>> - cycles:16, /* cycle count to last branch */
>> - type:4, /* branch type */
>> - reserved:40;
>> + union {
>> + __u64 val; /* to make it easier to clear all fields */
>> + struct {
>> + __u64 mispred:1, /* target mispredicted */
>> + predicted:1,/* target predicted */
>> + in_tx:1, /* in transaction */
>> + abort:1, /* transaction abort */
>> + cycles:16, /* cycle count to last branch */
>> + type:4, /* branch type */
>> + reserved:40;
>> + };
>> + };
>> };
>
>
> Hurpmh... all other bitfields have ENDIAN_BITFIELD things except this
> one. Power folks, could you please have a look?
The bit number of each field changes between big and little endian, but
as long as kernel and userspace are the same endian, and both only
access values via the bitfields then it works.
It's preferable to have the ENDIAN_BITFIELD thing, so that the bit
numbers are stable, but we can't change it now without breaking ABI :/
Adding the union risks having code that accesses val and expects to see
mispred in bit 0 for example, which it's not on big endian.
If it's just for clearing easily we could do that with a static inline
that sets all the bitfields. With my compiler here (GCC 10) it's smart
enough to just turn it into a single unsigned long store of 0.
eg:
static inline void clear_perf_branch_entry_flags(struct perf_branch_entry *e)
{
e->mispred = 0;
e->predicted = 0;
e->in_tx = 0;
e->abort = 0;
e->cycles = 0;
e->type = 0;
e->reserved = 0;
}
It does look like we have a bug in perf tool though, if I take a
perf.data from a big endian system to a little endian one I don't see
any of the branch flags decoded. eg:
BE:
2413132652524 0x1db8 [0x2d0]: PERF_RECORD_SAMPLE(IP, 0x1): 5279/5279: 0xc00000000045c028 period: 923003 addr: 0
... branch stack: nr:28
..... 0: c00000000045c028 -> c00000000dce7604 0 cycles P 0
LE:
2413132652524 0x1db8 [0x2d0]: PERF_RECORD_SAMPLE(IP, 0x1): 5279/5279: 0xc00000000045c028 period: 923003 addr: 0
... branch stack: nr:28
..... 0: c00000000045c028 -> c00000000dce7604 0 cycles 0
^
missing P
I guess we're missing a byte swap somewhere.
cheers