Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

From: Nick Gasson
Date: Mon Jun 01 2020 - 06:02:58 EST


On 06/01/20 16:53 PM, Ian Rogers wrote:
> On Sun, May 31, 2020 at 11:17 PM Nick Gasson <nick.gasson@xxxxxxx> wrote:
>>
>> On 05/28/20 17:32 PM, Ian Rogers wrote:
>> >
>> > So on tip/perf/core with:
>> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
>> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>> >
>> > I've been trying variants of:
>> >
>> > Before:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 578
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 6
>> >
>> > After:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 589
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 60
>> >
>> > So maybe the jit cache isn't behaving the way that the patch cures,
>> > the uptick in unknowns appears consistent in my testing though. I
>> > expect user error, is there an obvious explanation I'm missing?
>> >
>>
>> Hi Ian,
>>
>> I tried this as well with latest perf/core. The difference is that
>> unresolved addresses currently look like:
>>
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff451499a4
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff4514f3e8
>> 0.00% java [JIT] tid 221782 [.] 0x0000ffff45149394
>>
>> But after Steve's patch this becomes:
>>
>> 0.00% java [unknown] [.] 0x0000ffff58557d14
>> 0.00% java [unknown] [.] 0x0000ffff785c03b4
>> 0.00% java [unknown] [.] 0x0000ffff58386520
>>
>> I couldn't see any events that were symbolised before but are no longer
>> symbolised after this patch.
>
> I see this, thanks for digging into the explanation! Were you able to
> get a test case where the unknowns went down? For example, by forcing
> the code cache size to be small? This is the result I'd expect to see.

I tried the same Dacapo benchmark as you with different values of
InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.

Base Patched
100M 338 373
50M 333 315
25M 323 368
15M 1238 309
10M 2600 333
1M 6035 337

This looks fairly convincing to me: the cliff at 15M is where the code
cache starts needing to be enlarged.

>
>> I think most of these unknown events are caused by the asynchronous
>> nature of the JVMTI event handling. After an nmethod is compiled the
>> JVMTI event is posted to the Service Thread (*). So there can be a delay
>> between the time the compiled code starts executing and the time the
>> plugin receives the compiled code load event.
>>
>> Here's an edited down example:
>>
>> java 215881 750014.947873: 1289634 cycles:u: ffff7856ad10 [unknown] ([unknown])
>> Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: [0xffff785694c0(0x640) @ 0x40 fd:01 121010091 1]:
>> java 215881 750014.948665: 1295994 cycles:u: ffff7856ad10 org.python.core.__builtin__.range(org.python
>>
>> The plugin receives the event ~100us after the first sample inside that
>> method. Ideally we would use the timestamp when the method was actually
>> compiled, but I can't see any way to extract this information.
>
> Hmm.. this is a bit weird as the compile_info at one point was a
> resource allocation and so would be cleared out when the resource mark
> was passed on the compiler's return. Not something you'd want to do
> asynchronously. Presumably this has changed to improve performance,
> but doing the jvmti on a separate thread defeats jitdump - if we see
> samples in code ahead of the code being loaded. Perhaps a profiler
> like async-profiler
> (https://github.com/jvm-profiling-tools/async-profiler) has a
> workaround for the lag.
>

I had a brief look at the async-profiler JVMTI code but I can't see
anything it's doing differently. Their profiler.sh attaches to a running
JVM - I suspect this problem will be less obvious once the JIT has
warmed up.

The compile_info structure is still resource-allocated in
JvmtiExport::post_compiled_method_load() but that function is called
from the service thread not the compiler thread. The compiler thread
just pushes a nmethod* on a queue and the inlining data is recovered
from the nmethod object.

It seems we could extend JVMTI here to pass the code installation
timestamp as another CMLR record type.

--
Nick