Re: [PATCH v2] perf tool: Fix ppid for synthesized fork events

From: Arnaldo Carvalho de Melo
Date: Fri Mar 27 2015 - 10:20:37 EST


Em Fri, Mar 27, 2015 at 08:03:28AM -0600, David Ahern escreveu:
> On 3/27/15 7:10 AM, Don Zickus wrote:
> >I talked with Joe on my way out the door yesterday and he confirmed, just
> >removing -BN from our test showed a performance hit with your patch. With
> >the -BN option, there is no performance hit and we are perfectly fine with
> >your patch.

> >So, I guess I am confused how the -BN and your patch could change behaviour.

> I am too. This change has nothing to do with buildid's and scanning the
> buildid code setting the ppid correctly should not cause any extra work.

> Arnaldo: any thoughts?

-N, --no-buildid-cache
do not update the buildid cache
-B, --no-buildid do not collect buildids in perf.data

-B implies -N.

If you say just -N it will, at the end of the record session, process
and the samples in the just generated file, creating an rbtree of
threads nad mmaps in those threads, so that if can figure out which DSOs
associated with those maps had hits.

For those it will read the on disk file looking for an ELF session where
it'll find the build-id (few bytes) and will stash the resulting table
in the perf.data header.

If you instead (or in addition to) specify -B, it will not traverse the
just generated perf.data file, greatly reducing the overhead _at the
end_ of the record session.

With that said, lemme read again what is that is being measured...

> >Just to re-iterate what we did, Joe kicked off a specJBB run and he did 20
> >captures of two runs (one with the unpatched binary and one with a pached
> >binary).

> >for i in {1..20}
> >do
> > time perf.unpatched mem record -a -e cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10
> > time perf.patched mem record -a -e cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10
> >done

> >then we repeat the above test but with -BN in both runs. We compare the
> >log sizes to make sure they are similar for the random snapshots and compare
> >the times. With the -BN option, the times are generally within +/- 0.5
> >seconds of each. Without the -BN option the patched perf binary is
> >generally +20-40 seconds slower.

Humm, it is definetely strange, I would run 'perf record -o
some-other-file' to investigate that...

> >However, based on your description above about what the -BN option does, I
> >am scratching my head about our results. Thoughts?

... which is what David is suggesting here:

> Try this:
> perf record -o unpatched.data -g -- perf.unpatched mem record -a -e
> cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10
>
> perf record -o patched.data -g -- perf.patched mem record -a -e
> cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10
>
> And then compare the reports for each.

Cache effects, i.e. OS FS caches for the files accessed when doing the
build id table could be responsible for part of the difference at some
point, but further investigation by using 'perf record'
patched/unpatched will give us more clues.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/