Re: [PATCH] perf cpumap: Fix buffer overflow in cpu_map__snprint()

From: James Clark

Date: Thu Jun 11 2026 - 05:54:25 EST




On 11/06/2026 10:41 am, Tianchen Ding wrote:
When SMT is disabled on a large-core-count ARM64 system (e.g., 192
cores), the online CPU list becomes a long sequence of non-contiguous
even numbers (0,2,4,...,382) that cannot be range-compressed. This
string can far exceed the 128-byte stack buffer used by callers like
evlist__warn_user_requested_cpus().

The root cause is that snprintf() returns the number of characters that
*would* have been written if the buffer were large enough, not the
actual number written. When the cumulative return value 'ret' exceeds
'size', the expression 'size - ret' wraps around to a huge value
(since both are size_t / unsigned), causing subsequent snprintf() calls
to write far beyond the buffer boundary, corrupting the stack canary
and resulting in:

WARNING: A requested CPU in '1' is not supported by PMU 'cpu' (CPUs
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,
52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,) for event 'cycles'
*** stack smashing detected ***: terminated
Aborted (core dumped)

Fix this by adding a bounds check at the top of the loop: once ret
reaches or exceeds size, stop appending. This follows the same pattern
as snprintf() itself - the returned value may exceed size to indicate
truncation occurred, but no out-of-bounds write will happen.

Signed-off-by: Tianchen Ding <dtcccc@xxxxxxxxxxxxxxxxx>
---
tools/perf/util/cpumap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 21fa781b03cc..8328f18b7a84 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -680,6 +680,9 @@ size_t cpu_map__snprint(struct perf_cpu_map *map, char *buf, size_t size)
struct perf_cpu cpu = { .cpu = INT16_MAX };
bool last = i == (int)perf_cpu_map__nr(map);
+ if (ret >= size)
+ break;
+
if (!last)
cpu = perf_cpu_map__cpu(map, i);

I think this covers up the root cause a little bit. It doesn't fix the human readable output by adding truncation marks "...", and it definitely doesn't help any scripts that are parsing any of this output.

I checked for usages of it, and it is generally used for output but whether read by scripts or human we can't know, so we should probably make it always output something sane. However, there is one internal use in evsel__tpebs_start_perf_record() which uses a very small buffer so this might make a visible problem hidden.

It isn't a huge change to replace this with a version that allocates and always works. It's not like there's a reason to not allocate, other than convenience of writing the code in the first place. At least there are only 15 and not 1000 calls to it.

Thanks
James