Re: [PATCH] perf test: Fix perf test 11 hwmon endianess issue
From: David Laight
Date: Sat Feb 01 2025 - 12:14:46 EST
On Sat, 1 Feb 2025 08:12:38 -0800
Ian Rogers <irogers@xxxxxxxxxx> wrote:
> On Sat, Feb 1, 2025 at 3:34 AM David Laight
> <david.laight.linux@xxxxxxxxx> wrote:
> >
> > On Fri, 31 Jan 2025 12:24:00 +0100
> > Thomas Richter <tmricht@xxxxxxxxxxxxx> wrote:
> >
> > > perf test 11 hwmon fails on s390 with this error
> > >
> > > # ./perf test -Fv 11
> > > --- start ---
> > > ---- end ----
> > > 11.1: Basic parsing test : Ok
> > > --- start ---
> > > Testing 'temp_test_hwmon_event1'
> > > Using CPUID IBM,3931,704,A01,3.7,002f
> > > temp_test_hwmon_event1 -> hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/
> > > FAILED tests/hwmon_pmu.c:189 Unexpected config for
> > > 'temp_test_hwmon_event1', 292470092988416 != 655361
> > > ---- end ----
> > > 11.2: Parsing without PMU name : FAILED!
> > > --- start ---
> > > Testing 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/'
> > > FAILED tests/hwmon_pmu.c:189 Unexpected config for
> > > 'hwmon_a_test_hwmon_pmu/temp_test_hwmon_event1/',
> > > 292470092988416 != 655361
> > > ---- end ----
> > > 11.3: Parsing with PMU name : FAILED!
> > > #
> > >
> > > The root cause is in member test_event::config which is initialized
> > > to 0xA0001 or 655361. During event parsing a long list event parsing
> > > functions are called and end up with this gdb call stack:
> > ...
> > > However member key::type_and_num is defined as union and bit field:
> > >
> > > union hwmon_pmu_event_key {
> > > long type_and_num;
> > > struct {
> > > int num :16;
> > > enum hwmon_type type :8;
> > > };
> > > };
> >
> > That is entirely horrid.
>
> It also has the side-effect that if you initialize the struct bits in
> the type_and_num will be undefined and sanitizers will try to trip you
> up.
>
> > I'm surprised this even compiles:
> > static size_t hwmon_pmu__event_hashmap_hash(long key, void *ctx __maybe_unused)
> > {
> > return ((union hwmon_pmu_event_key)key).type_and_num;
> > }
> > It has to be just 'return key', but I'm not sure what the hashmap code is doing.
> >
> > AFAICT the code is just trying to generate a value for the hashmap to hash on?
> > Why not just use (type << 16 | num) instead of 'trying to be clever' with a union?
>
> The purpose wasn't so much to be 'clever' as you say. Perf event
> configs, which is where type_and_num lives in the events, have their
> bitfields described either by convention or by files in
> /sys/devices/<pmu>/format. On an Intel laptop:
> ```
> $ cat /sys/devices/cpu/format/inv
> config:23
> ```
> So I can go:
> ```
> $ perf stat -e 'inst_retired.any/inv=1/' true
>
> Performance counter stats for 'true':
>
> 1,069,756 inst_retired.any/inv=1/
>
> 0.001539265 seconds time elapsed
>
> 0.001719000 seconds user
> 0.000000000 seconds sys
> ```
> and the configuration of my inst_retired.any event now has bit 23 in
> the config set. Just as the format files try to break down what the
> bitfields within a config u64 are doing, the union was trying to do
> similar. Making an attempt to have the value be intention revealing
> which the shift and masking may lose, for example, by making it look
> like bits in the num could overlap with and modify the type.
Except that is really doesn't work for big-endian at all.
The 'fix' doesn't even really paper over the cracks properly.
I'm not even sure the hashing works on 64bit BE.
(assuming the int:n are also BE - but that is a different ABI option.)
The 'num' and 'type' will be in the high 24 bits of the 64bit 'long'.
If we assume the rest of the union is actually initialised the other
bits are all zero.
And I guess that the hash function is masking with 2^n-1 - so always gets zero.
David
>
> Thanks,
> Ian