Re: [PATCH v12 00/19] perf: Use e_machine and lazily compute symbols
From: Arnaldo Carvalho de Melo
Date: Wed Jun 03 2026 - 15:51:17 EST
On Tue, Jun 02, 2026 at 10:39:49PM -0700, Namhyung Kim wrote:
> On Tue, Jun 02, 2026 at 09:53:59AM -0700, Ian Rogers wrote:
> > On Tue, Jun 2, 2026 at 8:25 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > >
> > > Add a helper to perf_env to compute the e_machine if it is EM_NONE.
> > > Derive the value from the arch string if available. Similarly derive
> > > the arch string from the ELF machine if available, for consistency.
> > > This means perf's arch (machine type) is no longer determined by uname
> > > but set to match that of the perf ELF executable.
> > >
> > > Migrate code away from strcmp on env->arch to using the e_machine
> > > comparisons that are more accurate and not prone to uname and other
> > > naming differences. While cleaning this up, also clean up the
> > > capstone initialization code to cover more architectures and to set
> > > the big endian flag based on ELF header information.
> > >
> > > Refactor perf_env__arch_strerrno to take an e_machine instead of an
> > > architecture string, removing the HAVE_LIBTRACEEVENT dependency
> > > entirely and making it unconditionally available. The generated errno
> > > table includes fallback definitions for newer ELF machine constants to
> > > ensure compatibility with older host glibc versions.
> > >
> > > Introduce a mutex in perf_env to safely protect lazy metadata setup,
> > > such as os_release or e_machine resolution, preventing concurrent
> > > initialization data races and memory leaks during multi-threaded
> > > profiling or symbol loading. Properly initialize stack-allocated
> > > perf_env instances to ensure safe mutex destruction.
> > >
> > > Switch the idle computation to the point of use and lazily compute it,
> > > rather than computing it for every symbol. The current only user is
> > > `perf top`. At the point of use the perf_env is available and this can
> > > be used to make sure the idle function computation correctly accounts
> > > for architecture-specific and kernel-version-specific patterns.
> > > To prevent concurrent updates to shared symbol bitfield flags, migrate
> > > bitfield variables in struct symbol to C11 atomic flags.
> >
> > So I think this series is at the point where Sashiko [1] is giving
> > warnings only for out-of-scope things and pre-existing conditions. I
> > will give a detailed explanation below, but I'd appreciate help moving
> > this forward with human review and submission. Thanks!
> >
> > > Ian Rogers (19):
> > > perf env: Add perf_env__e_machine helper and use in perf_env__arch
> >
> > 1 critical 2 high issues.
> > The issues relate to existing data races, the inaccurate arch string,
> > and normalizing the arch string stored in the data file. The existing
> > data races don't bite us currently due to the single threaded nature
> > of most of perf - multithreading is on the TODO list. The arch string
> > is inaccurate and the e_machine in newer perf.data files resolves
> > this. If we were using the arch string without the e_machine then the
> > concerns over its use are valid, but this series is trying to remove
> > the use of the arch string and strongly prefer the e_machine.
> >
> > > perf tests topology: Switch env->arch use to env->e_machine
> >
> > No regressions.
> >
> > > perf env, dso, thread: Add _endian variants for e_machine helpers
> >
> > 1 high issue for a potential pre-existing SEGV if a thread lacks maps.
> > Let's hope that doesn't happen, the example given assumes a
> > multithreaded environment and multi-threading is on the TODO list.
> >
> > > perf capstone: Determine architecture from e_machine
> >
> > 1 low issue. A flag only present in capstone 4.0 is used. As capstone
> > 4.0 was released in 2018, let's just assume the flag is there rather
> > than adding yet more complexity.
> >
> > > perf print_insn: Use e_machine for fallback IP length check
> >
> > No regressions.
> >
> > > perf symbol: Avoid use of machine__is
> >
> > 1 high issue. Concerns over pre-existing cross-platform analysis
> > problems. Cross-platform analysis fully working is on the TODO list.
> >
> > > perf machine: Use perf_env e_machine rather than arch
> > > perf sample-raw: Use perf_env e_machine rather than arch
> > > perf sort: Use perf_env e_machine rather than arch
> > > perf arch common: Use perf_env e_machine rather than arch
> > > perf header: In print_pmu_caps use perf_env e_machine
> > > perf c2c: Use perf_env e_machine rather than arch
> > > perf lock-contention: Use perf_env e_machine rather than arch
> > > perf env: Refactor perf_env__arch_strerrno
> > > perf env: Remove unused perf_env__raw_arch
> >
> > No regressions x9.
> >
> > > perf env: Add mutex to protect lazy environment initialization
> >
> > 1 medium issue requesting more locking on more bits of perf_env.
> > Multi-threading is on the TODO list and let's stop the feature creep
> > here.
> >
> > > perf env: Add helper to lazily compute the os_release
> >
> > 1 high issue. Concern over a perf data issue in pipe mode. Addressing
> > this would require a fairly major overhail of perf data, so let's add
> > fixing to the TODO list.
> >
> > > perf symbol: Add setters for bitfields sharing a byte to avoid
> > > concurrent update issues
> > > perf symbol: Lazily compute idle
> >
> > No regressions x2.
>
> Acked-by: Namhyung Kim <namhyung@xxxxxxxxxx>
Thanks, applied to perf-tools-next, for v7.2.
- Arnaldo