Re: [PATCH v20 0/5] perf tools: Add inject --aslr feature
From: Arnaldo Carvalho de Melo
Date: Fri Jun 12 2026 - 20:26:16 EST
On Thu, Jun 11, 2026 at 11:29:02AM -0700, Ian Rogers wrote:
> On Thu, Jun 11, 2026 at 9:41 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > This patch series introduces the new 'perf inject --aslr' feature to
> > remap virtual memory addresses or drop physical memory event leaks
> > when profile record data is shared between machines. Bundled with this
> > feature is a bug fix inside the core map tracking tool that hardens
> > perf session analysis against concurrent lookup data races.
> >
> > Detailed Mechanism of MMAP Mapping and ASLR virtual Address Allocation:
> >
> > The ASLR tool virtualizes the address space of the recorded processes by
> > intercepting MMAP and MMAP2 events to build a consistent translation
> > database, which is subsequently used to rewrite sample addresses.
> >
> > It maintains two primary lookup databases using hash maps:
> > 1. 'remap_addresses': Maps an original mapping key to its new remapped
> > base address. The key uses topological invariant coordinates:
> > (machine, dso, invariant). The invariant is computed as (start - pgoff)
> > for DSO-backed mappings. This invariant remains constant even when
> > perf's internal overlap-resolution splits a VMA into fragmented
> > pieces, ensuring split maps resolve consistently back to the same
> > remapped base.
> > 2. 'top_addresses': Tracks the allocation state per process (machine, pid).
> > It maintains 'remapped_max' (the highest allocated address in the
> > virtualized space).
> >
> > For each MMAP/MMAP2 event:
> > - We look up the DSO and invariant key in 'remap_addresses'. If found, we
> > reuse the translation, preserving the offset within the mapping.
> > - If not found, we allocate a new remapped address space:
> > - We use thread__find_map to look up the mapping immediately preceding
> > the new one in the original address space (at start - 1). If
> > the preceding
> > mapping was also remapped, we place the new mapping
> > contiguously after it in the remapped space. This preserves
> > contiguity of split mappings (e.g., symbols split by HugeTLB,
> > or anonymous .bss segments adjacent to initialized data).
> > - If no contiguous mapping is found, we insert a 1-page gap from
> > the highest allocated address (remapped_max) to prevent accidental
> > merging of unrelated VMAs.
> > - The event's start address (and pgoff for kernel maps) is rewritten,
> > and the event is delegated to the output writer.
> >
> > To remain strictly conservative and guarantee security, the tool
> > scrubs breakpoint addresses (bp_addr) from all synthesized stream
> > headers, completely drops PERF_RECORD_TEXT_POKE events to prevent
> > absolute immediate pointer operands leaks, and drops unsupported
> > complex payloads (such as user register stacks, raw tracepoints, and
> > hardware AUX tracing frames).
> >
> > Verification is reinforced with shell test ('inject_aslr.sh').
> >
> > Prerequisite Bug Fix (Patch 1). During development, a core map
> > indexing issue was identified and resolved to prevent concurrent
> > lookup data races during session analysis.
> >
> > Changes since v19:
> > - Patch 1: Group lock and unlock operations inside maps__mutate_mapping() into
> > a single conditional block to resolve Clang 15 -Wthread-safety-analysis
> > compilation errors.
> > - Patch 5: Skip kernel-based ASLR test cases (test_kernel_aslr and
> > test_kernel_report_aslr) on ARM architectures (aarch64 and arm*) to
> > bypass high latency constraints and symbolization inconsistencies.
> >
> > Changes since v18:
> > - Patch 2 & 3: Squashed the bounds checking boundary fixes into the "Strip
> > sample registers" patch. The array bounds checking now correctly uses
> > 'orig_sample_type' to traverse the event payload, preventing heap
> > corruption when dealing with events that have had their registers
> > stripped by the ASLR tool pipeline.
> > - Patch 2 & 3: Rebased the commit series to properly isolate the sample
> > address remapping logic from the register stripping logic.
> > - Patch 2 & 3: Expanded commit messages to extensively document the
> > cross-endian behavior of 'perf inject'. Because 'perf inject' effectively
> > acts as an endianness converter (writing a host-endian PERF_MAGIC and
> > flushing events exactly as they sit in memory after being byte-swapped
> > by perf_event__all64_swap), all injected events must be perfectly
> > constructed in the host's native endianness. Specifically,
> > perf_event__all64_swap byte-swaps the raw 64-bit payloads, which causes
> > 32-bit sequential fields like PERF_SAMPLE_TID (containing pid and tid)
> > to have their ordering reversed in memory (e.g., [BE_pid][BE_tid] becomes
> > [LE_tid][LE_pid]). The ASLR tool's sample construction logic was
> > expanded to explicitly unpack these fields and repack them sequentially
> > via unions to guarantee a strictly host-endian layout that resolves
> > these inversion anomalies. Similarly, branch stack flags (which are
> > modified in-place to host-endian bitfields by the parser) are copied
> > directly to the newly synthesized event, and 'needs_swap=false' is explicitly
> > used when re-parsing the synthesized event to prevent erroneous double
> > swapping.
> > - Series: Verified cross-endian robustness via the sashiko analyzer.
> >
> > Changes since v17:
> > - Patch 2: Reordered ksymbol deletion logic to ensure
> > `perf_event__process_ksymbol` deletes the map *after* the
> > `aslr_tool__findnew_mapping` translates the unregister offsets.
> > - Patch 2: Changed `aslr_tool__delete` to cleanly handle guest machine
> > deletion memory leaks.
> > - Patch 2: Resolved read-only segfaults on memory-mapped perf.data
> > headers during attribute stripping by using deep copies in
> > `perf_event__repipe_attr`.
> > - Patch 2: Fixed user space remap invariant logic to include
> > `(start - map__start(al.map))` preventing negative overflows on module
> > offset boundaries.
> > - Patch 3: Removed duplicate `bswap_64` payload byte-swapping inside the
> > array logic, allowing the host endianness macros `COPY_U64()` to
> > handle it dynamically.
> > - Patch 3: Fixed LBR branch sample starvation by explicitly reading branch
> > counters instead of dropping the entire sample.
> > - Patch 5: Fixed test flakiness by grepping out physical hex addresses
> > `0x[0-9a-f]{8,}` instead of matching exact address strings.
> > - Patch 5: Parameterized temp reports and updated test to scale with
> > `/dev/urandom` continuous random reads.
> > - Patch Series: Added Signed-off-by tags uniformly and Assisted-by tags to
> > track assistance.
> >
> > Changes since v16:
> > - Patch 2: Refactored inline ASLR stripping logic out of builtin-inject.c
> > and into dedicated helpers (aslr_tool__strip_attr_event and
> > aslr_tool__strip_evlist) in aslr.c to better separate concerns.
> > - Patch 2: Fixed guest machine allocation memory leak in
> > aslr_tool__delete() where machines__exit() explicitly skipped freeing
> > the guest processes tree.
> > - Patch 3: Fixed bounds-check violations during cross-endian parsing inside
> > aslr_tool__process_sample() by correctly applying bswap_64() to raw
> > offsets, iteration counts, sizes, and addresses prior to logical
> > evaluation when orig_needs_swap is active.
> > - Patch 4: Fixed pipe mode parser misalignment bug by safely fetching
> > needs_swap from the initialized evsel rather than blindly intercepting
> > HEADER_ATTR events prior to session parsing.
> > - Patch 4: Resolved checkpatch.pl line length warnings in the bswap_64
> > endianness swapping logic.
> > - Patch Series: Reordered the final two patches. "perf aslr: Strip
> > sample registers" is now Patch 4, and "perf test: Add inject ASLR
> > test" is now Patch 5. This ensures the register stripping logic
> > is fully introduced before the comprehensive shell tests validate it,
> > preventing bisectability test failures and easing merge conflicts.
> > - Patch 5: Fixed "User registers stripping test" starvation when run as
> > root by explicitly using '-e cycles:u' during recording, preventing
> > the ring buffer from overflowing with kernel samples.
> >
> > Changes since v15:
> > - Patch 2: Added bounds checking for event->header.size before writing
> > to breakpoint fields to avoid heap buffer overflow on older ABI events.
> > - Patch 2: Fixed asymmetric calculation bug in aslr_tool__findnew_mapping()
> > where pgoff for anonymous kernel memory was not properly subtracted upon
> > insertion, causing the lookup addition to overflow.
> > - Patch 2: Added detailed comments documenting the symmetric lookup and
> > insertion math for unmapped and mapped memory blocks.
> > - Patch 5: Add missing kprobe and uprobe scrubbing of config1 and
> > config2 during aslr_tool__strip_evlist() to strictly conform with
> > repipe constraints.
> >
> > Changes since v14:
> > - Patch 2: Removed unnecessary vertical whitespace in builtin-inject.c.
> > - Patch 2: Added comments explaining why pgoff is assigned for
> > anonymous memory maps to prevent ASLR leaks.
> > - Patch 2: Removed orig_last_end tracking and refactored contiguous mapping
> > detection to use thread__find_map(..., start - 1, ...) based on Gabriel's
> > feedback.
> > - Patch 2: Scrub kprobe/uprobe event config1 and config2 fields to prevent
> > address leaks.
> > - Patch 2: Overwrite pgoff with the remapped start address for anonymous
> > mappings (detected via is_anon_memory and is_no_dso_memory).
> > - Patch 3: Fix C90 mixed declaration error for orig_needs_swap.
> > - Patch 3: Temporarily disable evsel->needs_swap during the secondary
> > evsel__parse_sample() call to prevent branch stack double-swapping bugs.
> >
> > Changes since v13:
> > - Patch 2: Added a NULL check for env before calling
> > perf_env__kernel_is_64_bit(env) to prevent potential segfaults if the
> > recorded environment has no headers.
> > - Patch 5: Fixed sample_size and id_pos going out of sync during
> > aslr_tool__strip_evlist() and aslr_tool__restore_evlist(). Instead of
> > using evsel__reset_sample_bit(), which was acting as a no-op due to
> > early bit clearing and corrupted sample_size, the tool now directly
> > updates sample_type and recomputes sample_size/id_pos dynamically.
> > Added orig_sample_size to aslr_evsel_priv to correctly restore the
> > state.
> >
> > Changes since v12:
> > - Patch 2: Fixed potential NULL pointer dereference in
> > remap_addresses__hash() when handling unmapped memory events (key->dso
> > is NULL) under REFCNT_CHECKING.
> > - Patch 2: Dynamically detect machine architecture bitness via
> > perf_env__kernel_is_64_bit() to select appropriate kernel_space_start
> > boundaries, avoiding 64-bit address injection on 32-bit platforms.
> >
> > Changes since v11:
> > - Patch 1: Fixed struct dso name accessor in maps.c by using
> > dso__name() instead of ->name.
> > - Patch 2: Fixed hash function in aslr.c to hash the underlying
> > dso pointer using RC_CHK_ACCESS to support reference count checking.
> >
> > Changes since v10:
> > - Patch 1: Added explicit tracking array logic in maps__load_maps()
> > to correctly accumulate valid maps (skipping NULL entries after
> > failures) and safely return the exact populated count, resolving
> > out-of-bounds pointer iteration panics.
> > - Patch 3: Fixed endianness bug during cross-endian sample parsing
> > by passing evsel->needs_swap instead of false to __evsel__parse_sample
> > in aslr.c, ensuring correct 32-bit field byte unswapping for packed
> > fields. Refactored evsel__parse_sample to take a needs_swap argument
> > via __evsel__parse_sample.
> > - Patch 4: Fixed inject_aslr.sh exit code handling in trap functions
> > to capture and propagate the correct pipeline failure status code
> > instead of unconditionally returning success or failing the test.
> >
> > Changes since v9:
> > - Patch 1: Added `-ENOMEM` error check inside
> > `maps__find_symbol_by_name()` and return `NULL` early. Added map
> > sorting state invalidation on early return in `maps__load_maps()`.
> > - Patch 2: Fixed encapsulation by using `thread__maps()` and
> > `thread__pid()` accessors in `aslr_tool__findnew_mapping()`. Added
> > `pr_warning_once` warning when raw auxtrace data is dropped.
> > - Patch 3: Fixed encapsulation by using `thread__maps()` and
> > `thread__pid()` accessors in `aslr_tool__remap_address()`. Wrapped
> > `evsel__parse_sample()` to temporarily disable `needs_swap` to avoid
> > branch stack endianness corruption on cross-endian files. Fixed ISO
> > C90 warning for declaration-after-statement for `orig_needs_swap`.
> > - Patch 4: Fixed duplicate cleanup by explicitly removing trap
> > handlers (`trap - EXIT TERM INT`) inside the `cleanup()` function.
> > - Patch 5: Fixed heap corruption by adding size bounds checking before
> > writing to `sample_regs_user` and `sample_regs_intr` fields. Added
> > missing register mask clearing logic for the `itrace` synthesis path
> > of `perf_event__repipe_attr()`.
> >
> > Ian Rogers (5):
> > perf maps: Add maps__mutate_mapping
> > perf inject/aslr: Add ASLR tool infrastructure and MMAP tracking
> > perf inject/aslr: Implement sample address remapping
> > perf aslr: Strip sample registers
> > perf test: Add inject ASLR test
>
> The sashiko reviews are at:
> https://sashiko.dev/#/patchset/20260611164122.3974068-1-irogers%40google.com
>
> To summarize:
>
> Patch 2:
> * TOCTOU if underlying event buffer mmaps change. Not an issue as
> rewriting a perf.data file while it is being read is out of scope.
>
> Patch 3:
> * Mapping addresses to 0 for unknown mappings is criticized but the
Why not then have some unknown mappings hashmap that will assign a
random, unique address on a address range that doesn't overlap with any
of the other maps?
Zero has special meaning, mapping some non-zero address to it introduces
confusion when what we want is to just make sure that we don't leak
addresses?
> proposed alternative doesn't hide ASLR. This will cluster things on
> address 0 but the fix is simply to ensure no MMAPs are missing.
> * Cross-endian issues, but as explained previously, these are out of scope.
>
> The clang build issue reported by James and disabling the kernel
> testing for ARM are both in the v20 series. So I think the patches are
> ready for review/merging.
I reviewed one other patch in the series besides the above suggestion.
Thanks, its an useful feature!
- Arnaldo
> Thanks,
> Ian
>
> > tools/perf/builtin-inject.c | 81 +-
> > tools/perf/tests/shell/inject_aslr.sh | 533 ++++++++++
> > tools/perf/util/Build | 1 +
> > tools/perf/util/aslr.c | 1406 +++++++++++++++++++++++++
> > tools/perf/util/aslr.h | 44 +
> > tools/perf/util/evsel.c | 6 +-
> > tools/perf/util/evsel.h | 10 +-
> > tools/perf/util/machine.c | 32 +-
> > tools/perf/util/maps.c | 148 ++-
> > tools/perf/util/maps.h | 3 +
> > tools/perf/util/symbol-elf.c | 41 +-
> > tools/perf/util/symbol.c | 17 +-
> > 12 files changed, 2251 insertions(+), 71 deletions(-)
> > create mode 100755 tools/perf/tests/shell/inject_aslr.sh
> > create mode 100644 tools/perf/util/aslr.c
> > create mode 100644 tools/perf/util/aslr.h
> >
> > --
> > 2.54.0.1099.g489fc7bff1-goog
> >