Re: [PATCH 26/28] perf timechart: Bounds check cpu_id and fix topology_map allocation
From: Arnaldo Carvalho de Melo
Date: Tue May 12 2026 - 15:48:31 EST
On Tue, May 12, 2026 at 11:32:48AM -0700, Ian Rogers wrote:
> On Sat, May 9, 2026 at 8:37 PM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> >
> > From: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> >
> > The cpu_idle, cpu_frequency, power_start, and power_frequency
> > tracepoint handlers extract cpu_id from the event payload via
> > evsel__intval() and use it directly as an array index into
> > cpus_cstate_start_times[] and cpus_pstate_start_times[], which
> > are allocated with MAX_CPUS (4096) entries.
> >
> > Unlike sample->cpu which is validated in perf_session__deliver_event(),
> > cpu_id comes from the tracepoint data and is never bounds checked.
> > A crafted perf.data with a malicious cpu_id in a tracepoint event
> > causes out-of-bounds array accesses.
> >
> > Validate cpu_id against tchart->numcpus (nr_cpus_avail from the
> > file header) and reject the event with an error if it is out of
> > range, as this indicates a corrupted or crafted file.
> >
> > The power_end handler uses sample->cpu (not a tracepoint cpu_id
> > field). Add a bounds check there too since a crafted file could
> > omit PERF_SAMPLE_CPU, leaving sample->cpu as the (u32)-1 sentinel
> > which would cause out-of-bounds access in c_state_end().
> >
> > Also validate sample->cpu in sched_switch and sched_wakeup
> > handlers, which store it in cpu_sample structs later used as
> > array indices into topology_map[] during SVG generation.
> >
> > Fix svg_build_topology_map() to allocate topology_map using
> > nr_cpus_avail instead of nr_cpus_online. When offline CPUs exist,
> > nr_cpus_online < nr_cpus_avail, and a valid cpu_id that passes
> > the numcpus check could still exceed the topology_map allocation,
> > causing a heap out-of-bounds read in cpu2y(). Reject negative CPU
> > values in str_to_bitmap() to prevent perf_cpu_map__new("") on an
> > empty topology string from passing -1 to __set_bit(), which would
> > write at offset ULONG_MAX/BITS_PER_LONG.
> >
> > Fix the pre-existing backtrace memory leak: change the
> > tracepoint_handler typedef to pass const char **backtrace
> > (pointer-to-pointer). Handlers that consume the string
> > (sched_switch, sched_wakeup) set *backtrace = NULL to claim
> > ownership. The caller always calls free() after the handler
> > returns — if ownership was taken the pointer is NULL and
> > free(NULL) is a no-op. Skip cat_backtrace() entirely when
> > tchart->with_backtrace is not set.
> >
> > Cap tchart->numcpus at MAX_CPUS in the HEADER_NRCPUS callback
> > so the bounds check cannot exceed the array allocation size.
>
> So there are some overlaps with the changes in:
> https://lore.kernel.org/lkml/20260413041143.1736055-18-irogers@xxxxxxxxxx/
> I'll repost the series that Namhyung started merging. It would be good
> to rebase these changes on that.
Please rebase and resubmit, I can adjust before sending v2 for the perf
data validation series.
- Arnaldo