[PATCH] perf/record: make perf_event__synthesize_mmap_events() scale
From: Stephane Eranian
Date: Wed Mar 15 2017 - 02:57:42 EST
This patch significantly improves the execution time of
perf_event__synthesize_mmap_events() when running perf record
on systems where processes have lots of threads. It just happens
that cat /proc/pid/maps support uses a O(N^2) algorithm to generate
each map line in the maps file. If you have 1000 threads, then you have
necessarily 1000 stacks. For each vma, you need to check if it corresponds
to a thread's stack. With a large number of threads, this can take a very long time. I have seen latencies >> 10mn.
As of today, perf does not use the fact that a mapping is a stack,
therefore we can work around the issue by using /proc/pid/tasks/pid/maps.
This entry does not try to map a vma to stack and is thus much
faster with no loss of functonality.
The proc-map-timeout logic is kept in case user still want some uppre limit.
Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
---
tools/perf/util/event.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4ea7ce7..b137566 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -255,8 +255,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
if (machine__is_default_guest(machine))
return 0;
- snprintf(filename, sizeof(filename), "%s/proc/%d/maps",
- machine->root_dir, pid);
+ snprintf(filename, sizeof(filename), "%s/proc/%d/tasks/%d/maps",
+ machine->root_dir, pid, pid);
fp = fopen(filename, "r");
if (fp == NULL) {
--
2.5.0