Re: [GIT PULL] perf tools changes for v6.10

From: Linus Torvalds
Date: Sat May 25 2024 - 13:22:44 EST


On Sat, 25 May 2024 at 00:38, Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> Yep, so I'm curious what makes it fail. IIUC the commit in question
> would convert "cycles" event to "${whatever_pmu}/cycles/" if the pmu
> has "events/cycles" alias in sysfs or in JSON. But it should work after
> that too. :(

Well, having looked at it some more, I have the main CPU PMU:

/sys/bus/event_source/devices/armv8_pmuv3_0/

which has a 'cpu_cycles' event, and the cpumask covers all the 128
cores in this thing.

That's the thing the regular profiling *should* use:

$ ls events/
br_mis_pred l1d_cache_wb l3d_cache_allocate
br_mis_pred_retired l1d_tlb l3d_cache_refill
br_pred l1d_tlb_refill l3d_cache_wb
br_retired l1i_cache mem_access
bus_access l1i_cache_refill memory_error
bus_cycles l1i_tlb sample_collision
cid_write_retired l1i_tlb_refill sample_feed
cpu_cycles l2d_cache sample_filtrate
exc_return l2d_cache_allocate sample_pop
exc_taken l2d_cache_refill stall_backend
inst_retired l2d_cache_wb stall_frontend
inst_spec l2d_tlb ttbr_write_retired
l1d_cache l2d_tlb_refill
l1d_cache_refill l3d_cache

But the thing that seems to have confused the new parsing is that this
machine _also_ has

/sys/bus/event_source/devices/arm_dsu_{0..63}/

which all have a 'cycles' event:

$ ls events/
bus_access cycles l3d_cache_allocate l3d_cache_wb
bus_cycles l3d_cache l3d_cache_refill memory_error

but these things are basically some "each pair of cpus has some bridge
to the L3", which is why this 128-core machine has 64 of them. So each
of those have something like this:

$ cat cpumask
40
$ cat associated_cpus
40-41

I did not look any closer at what the 'cycles' there can actually
count, but clearly they can't be used for recording. And even if they
can, they shouldn't for the default "cycles".

> It seems my system doesn't have the alias (both in x86_64 and arm64)
> at least in sysfs. I think that's why I don't see the failure and maybe
> there's a specific hardware issue - like a M1 macbook issue? IIRC it
> required the exclude_guest bit to be set, but I think we handled it already
> so this must be a different issue.

This is my new Ampere system, which has basically replaced the M2
Macbook Air as my arm64 test platform, since it builds the kernel a
whole lot faster. That's what 128 cores will do...

Ian's patch at

https://lore.kernel.org/lkml/20240525152927.665498-1-irogers@xxxxxxxxxx/

makes things work for me again, but I get the feeling that the JSON
parsing is too fragile. It should have noticed that the 'cycles' event
it found wasn't useful for profiling, and gone on to the next one,
instead of just giving an incomprehensible error message.

I think the real problem was that the JSON parsing code blindly just
looked for "cycles".

The only thing that seems to make Ian's new
evlist__add_default_events() special is that it uses
"perf_pmus__scan_core()" for that cycles finding, which in turn then
uses only the *core* PMU's.

It feels to me like the "parse the JSON info" is just too fragile,
picks the wrong events, and Ian's "add defaults" just works around the
weakness.

But whatever. With his patch, at least it works for me.

Linus