On Thu, 2009-08-06 at 10:48 +0300, Pekka Enberg wrote:Hi,
It's me again :-).
I have a little user-space application that is pretty memory hungry and I want to understand why. I started to google around for a memory profiler or a malloc() tracer but didn't seem to find anything really useful.
But then it hit me, why can't I have kmemtrace + perf but for user-space? Something like the "Malloc Trace" shown here:
http://developer.apple.com/documentation/developertools/conceptual/SharkUserGuide/OtherProfilingandTracingTechniques/OtherProfilingandTracingTechniques.html#//apple_ref/doc/uid/TP40005233-CH6-SW17
Does this sound like something that could/should be part of "perf"? How would all this work anyway? Can we intercept malloc() and free() somehow? Where would the data be pushed? Am I just going perf-crazy and trying to turn it into a swiss army knife because it's so easy to use?-)
OK, you just trod into a wasp's nest :-)
That sounds like uprobes, the equivalent of kprobes but for userspace.
I seem to have heard people are working on such a thing, but I can't
seem to find a single LKML post with 'uprobe' in the subject in the past
two years or something (except for MTUprobe) -- so I guess its not
really going anywhere any fast.
Now doing probes on userspace is hard because you need to know more
about the userspace bits than a kernel really ought to be interested in.
Then again, the only way to extract bits from userspace is to stop it --
now one could do that using [pu]trace and have some monitoring app prod
at it like any other debugger would, and I think this is the approach
suggested by some (hch iirc).
Others seem to think we ought to stuff all this into the kernel, I can
only imagine the pain that will cause, since you need to teach the
kernel about these instrumentation sites' context, so I can only imagine
it'd be through a kernel module interface much like system-tap does
(they would be doing the in-kernel bit).
Then there are the tracer folks who also want to collect userspace
traces. Some have proposed a sys_trace() call, others want to play silly
games with mmap() and then there is the uprobe idea. Others (tglx and
me) proposed letting userspace log itself and post-merge all the various
trace buffers to get a complete picture.
Anyway, like you say, it has uses (potentially very powerful ones),
Sun/Apple do it with Dtrace, Linux wants it but I don't think we quite
agreed on how to do it :-)
And here I see LKML isn't on the CC list, perhaps we should?