Re: [PATCH v6 0/4] perf: add support for profiling jitted code

From: Stephane Eranian
Date: Tue Mar 31 2015 - 19:55:17 EST


Hi Gregg,

On Tue, Mar 31, 2015 at 2:31 PM, Brendan Gregg
<brendan.d.gregg@xxxxxxxxx> wrote:
>
> On Tue, Mar 31, 2015 at 12:33 AM, Brendan Gregg
> <brendan.d.gregg@xxxxxxxxx> wrote:
> > G'Day Stephane,
> >
> > On Mon, Mar 30, 2015 at 3:19 PM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
> > [...]
> >> The current support only works when the runtime is monitored from
> >> start to finish: perf record java --agentpath:libpfmjvmti.so my_class.
> >>
> >> Once the run is completed, the jitdump file needs to be injected into
> >> the perf.data file. This is accomplished by using the perf inject command.
> >> This will also generate an ELF image for each jitted function. The
> >> inject MMAP records will point to those ELF images. The reasoning
> >> behind using ELF images is that it makes processing for perf report
> >> and annotate automatic and transparent. It also makes it easier to
> >> package and analyze on a remote machine.
> > [...]
> >
> > This is really impressive work. Do we have an idea of the overhead for
> > running the java agent?
> >


Thanks Gregg. Happy to see you find these patches useful. I think
with PeterZ's latest clock changes, things are easier to run now.

> > Today, I'm using perf-map-agent, loaded dynamically, to dump a
> > /tmp/perf*.map file as needed. My company has tens of thousands of
> > Linux instances running Java, but very few need profiling, and we
> > don't know which beforehand. So a snapshot-on-demand approach is
> > ideal. An always-on approach, well, we'd have to know the overhead (I
> > can build the agent and test...).
>
> I built the agent and tested with an application framework
> micro-benchmark, and saw the performance overhead drop after start
> from about 13% initially (measured as a reduction in maximum req/sec
> given fixed CPU capacity), to 1.1% after a minute, and then 0.13%
> (which is really just noise) after several minutes of high load.
>

If you're JIT runtime does not keep recompiling, then yes, I expect the
overhead to be concentrated on startup and each time a new function
is executed. Then after no callback is really needed. And this is what you
observed.

>
> So the overhead is basically zero after (minutes of) warmup, at least
> for my test. My jit.dump file reached 8 Mbytes, and was growing by a
> tiny amount every 30 seconds or so (hence the near-zero overhead). I'm
> much less concerned about overheads now.
>
> I'll test with a production workload if I can... But I'm still curious
> about why we're even doing this, instead of the previous method of
> taking symbol snapshots. Is there a backstory? If it involves a case
> of high symbol churn, then this should also mean non-zero overhead to
> constantly log.
>
Yes, so either you have the JIT runtime activate that agent from startup
or we need to have a mechanism to kick the agent when perf is running.

As for the fsync() question, yes, there is a race between JIT runtime
startup and dumping into the jitdump and perf inject. One thing I will
add in the locking on the inject side to make sure inject reads a sane
file (without truncated records). The layout of the jitdump is such that
it does not hold the number of records in the file. Inject just reads
until EOF, so that should be okay with locks. If you run
perf inject, then you are done with the collection. Pipe mode is still
not operational, will look at it next. Hopefully we can also make it work
with the jitdump file.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/