Re: [BUG] perf: memory leak in perf top

From: Wangnan (F)
Date: Mon Aug 08 2016 - 07:35:32 EST




On 2016/8/5 22:12, Arnaldo Carvalho de Melo wrote:
Em Fri, Aug 05, 2016 at 06:58:12PM +0800, Wangnan (F) escreveu:
perf top comsumes too much memory than it need.

Using following commands:

# yes '' > /dev/null
# perf top -e raw_syscalls:sys_enter

System will quickly become unresponsive because of out of memory.

Using GDB I find two call stacks of malloc. See below.

[call stack 1]

#0 __GI___libc_malloc (bytes=140723951319552) at malloc.c:2841
#1 0x00000000004b7246 in memdup (src=0x7f515e04b054, len=68) at
../lib/string.c:29
#2 0x00000000004f9299 in hist_entry__init (sample_self=true,
template=0x7ffcd9213a80, he=0xa37da50) at util/hist.c:401
#3 hist_entry__new (template=template@entry=0x7ffcd9213a80
<mailto:%28template=template@entry=0x7ffcd9213a80>,
sample_self=sample_self@entry=true)
<mailto:sample_self=sample_self@entry=true%29> at util/hist.c:458
#4 0x00000000004fa13f in hists__findnew_entry (hists=0x3549d30,
hists=0x3549d30, al=0x7ffcd9213ca0, sample_self=true, entry=0x
7ffcd9213a80) at util/hist.c:544
#5 __hists__add_entry (hists=0x3549d30, al=0x7ffcd9213ca0,
sym_parent=<optimized out>, bi=bi@entry=0x0 <mailto:bi=bi@entry=0x0>,
mi=mi@entry=0x0 <mailto:mi=mi@entry=0x0>, sampl
e=<optimized out>, sample_self=sample_self@entry=true
<mailto:sample_self=sample_self@entry=true>, ops=ops@entry=0x0)
<mailto:ops=ops@entry=0x0%29> at util/hist.c:600
#6 0x00000000004fa3d6 in hists__add_entry (sample_self=true,
sample=<optimized out>, mi=0x0, bi=0x0, sym_parent=<optimized out
, al=<optimized out>, hists=<optimized out>) at util/hist.c:611
#7 iter_add_single_normal_entry (iter=0x7ffcd9213ce0, al=<optimized out>) at
util/hist.c:823
#8 0x00000000004fc3c5 in hist_entry_iter__add (iter=0x7ffcd9213ce0,
al=0x7ffcd9213ca0, max_stack_depth=<optimized out>, arg=0x
7ffcd9214120) at util/hist.c:1030
#9 0x00000000004536d7 in perf_event__process_sample (machine=<optimized
out>, sample=0x7ffcd9213d30, evsel=0x3549bb0, event=<o
ptimized out>, tool=0x7ffcd9214120) at builtin-top.c:800
#10 perf_top__mmap_read_idx (top=top@entry=0x7ffcd9214120
<mailto:%28top=top@entry=0x7ffcd9214120>, idx=idx@entry=0)
<mailto:idx=idx@entry=0%29> at builtin-top.c:866
#11 0x00000000004559fd in perf_top__mmap_read (top=0x7ffcd9214120) at
builtin-top.c:883
#12 __cmd_top (top=0x7ffcd9214120) at builtin-top.c:1018
#13 cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized
out>) at builtin-top.c:1344
#14 0x0000000000493561 in run_builtin (p=p@entry=0x948000
<mailto:%28p=p@entry=0x948000> <commands+288>, argc=argc@entry=4
<mailto:argc=argc@entry=4>, argv=argv@entry=0x7ffcd92199b0)
<mailto:argv=argv@entry=0x7ffcd92199b0%29> at
perf.c:357
#15 0x0000000000437157 in handle_internal_command (argv=0x7ffcd92199b0,
argc=4) at perf.c:419
#16 run_argv (argv=0x7ffcd9219720, argcp=0x7ffcd921972c) at perf.c:465
#17 main (argc=4, argv=0x7ffcd92199b0) at perf.c:609

[call stack 2]
#0 __GI___libc_malloc (bytes=140723951319456) at malloc.c:2841
#1 0x00000000005766de in trace_seq_init (s=0x7ffcd92139b0) at trace-seq.c:60
#2 0x00000000004f7047 in get_trace_output (he=0x7ffcd9213a80) at
util/sort.c:584
Yeah, this allocates 4K, way too much, can you try with this patch,
which amounts to a trace_seq_trim()?

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 947d21f38398..37ddcf57500f 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -588,7 +588,7 @@ static char *get_trace_output(struct hist_entry *he)
} else {
pevent_event_info(&seq, evsel->tp_format, &rec);
}
- return seq.buffer;
+ return realloc(seq.buffer, seq.len);
}
static int64_t

It works: from 1.8g/10s to 40m/10s, but unfortunately your
patch is wrong: please use realloc(seq.buffer, seq.len + 1)
because of the ending '\0' :)

In my experiment RSS continuously increases because I use
raw_syscalls, which can result in too much entries to record
in perf top because of the nature of argument list (garbage
arguments are also considered by perf). It becomes much better
if I use --fields to reduce the number of possible entries.
However, theoretically perf will eventually consume all
system memory, unless we carefully choose tracepoints and
fields to make sure the possible entries is limited.

Thank you.