Re: [PATCH] perf test: Fix perf test 11 hwmon endianess issue

From: Ian Rogers
Date: Sat Feb 01 2025 - 11:13:01 EST


On Sat, Feb 1, 2025 at 3:34 AM David Laight
<david.laight.linux@xxxxxxxxx> wrote:
>
> On Fri, 31 Jan 2025 12:24:00 +0100
> Thomas Richter <tmricht@xxxxxxxxxxxxx> wrote:
>
> > perf test 11 hwmon fails on s390 with this error
> >
> > # ./perf test -Fv 11
> > --- start ---
> > ---- end ----
> > 11.1: Basic parsing test : Ok
> > --- start ---
> > Testing 'temp_test_hwmon_event1'
> > Using CPUID IBM,3931,704,A01,3.7,002f
> > temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
> > FAILED tests/hwmon_pmu.c:189 Unexpected config for
> > 'temp_test_hwmon_event1', 292470092988416 != 655361
> > ---- end ----
> > 11.2: Parsing without PMU name : FAILED!
> > --- start ---
> > Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/'
> > FAILED tests/hwmon_pmu.c:189 Unexpected config for
> > 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/',
> > 292470092988416 != 655361
> > ---- end ----
> > 11.3: Parsing with PMU name : FAILED!
> > #
> >
> > The root cause is in member test_event::config which is initialized
> > to 0xA0001 or 655361. During event parsing a long list event parsing
> > functions are called and end up with this gdb call stack:
> ...
> > However member key::type_and_num is defined as union and bit field:
> >
> > union hwmon_pmu_event_key {
> > long type_and_num;
> > struct {
> > int num :16;
> > enum hwmon_type type :8;
> > };
> > };
>
> That is entirely horrid.

It also has the side-effect that if you initialize the struct bits in
the type_and_num will be undefined and sanitizers will try to trip you
up.

> I'm surprised this even compiles:
> static size_t hwmon_pmu__event_hashmap_hash(long key, void *ctx __maybe_unused)
> {
> return ((union hwmon_pmu_event_key)key).type_and_num;
> }
> It has to be just 'return key', but I'm not sure what the hashmap code is doing.
>
> AFAICT the code is just trying to generate a value for the hashmap to hash on?
> Why not just use (type << 16 | num) instead of 'trying to be clever' with a union?

The purpose wasn't so much to be 'clever' as you say. Perf event
configs, which is where type_and_num lives in the events, have their
bitfields described either by convention or by files in
/sys/devices/<pmu>/format. On an Intel laptop:
```
$ cat /sys/devices/cpu/format/inv
config:23
```
So I can go:
```
$ perf stat -e 'inst_retired.any/inv=1/' true

Performance counter stats for 'true':

1,069,756 inst_retired.any/inv=1/

0.001539265 seconds time elapsed

0.001719000 seconds user
0.000000000 seconds sys
```
and the configuration of my inst_retired.any event now has bit 23 in
the config set. Just as the format files try to break down what the
bitfields within a config u64 are doing, the union was trying to do
similar. Making an attempt to have the value be intention revealing
which the shift and masking may lose, for example, by making it look
like bits in the num could overlap with and modify the type.

Thanks,
Ian