Re: [PATCH v2] perf report: distinguish between inliners in the same function

From: Milian Wolff
Date: Tue May 16 2017 - 09:18:22 EST


On Dienstag, 16. Mai 2017 02:53:32 CEST Namhyung Kim wrote:
> On Mon, May 15, 2017 at 12:01:54PM +0200, Milian Wolff wrote:
> > On Monday, May 15, 2017 3:21:58 AM CEST Namhyung Kim wrote:
> > > Hi Milian,
> > >
> > > On Sun, May 14, 2017 at 08:10:50PM +0200, Milian Wolff wrote:
> > > > On Freitag, 12. Mai 2017 15:01:29 CEST Namhyung Kim wrote:
> > > > > On Fri, May 12, 2017 at 12:37:01PM +0200, Milian Wolff wrote:
> > > > > > On Mittwoch, 10. Mai 2017 07:53:52 CEST Namhyung Kim wrote:
> > > > > > > Hi,
> > > > > >
> > > > > > > On Wed, May 03, 2017 at 11:35:36PM +0200, Milian Wolff wrote:
> > > > > > <snip>
> > > > > >
> > > > > > > > +static enum match_result match_chain_srcline(struct
> > > > > > > > callchain_cursor_node
> > > > > > > > *node, + struct callchain_list *cnode)
> > > > > > > > +{
> > > > > > > > + char *left = get_srcline(cnode->ms.map->dso,
> > > > > > > > + map__rip_2objdump(cnode->ms.map, cnode->ip),
> > > > > > > > + cnode->ms.sym, true, false);
> > > > > > > > + char *right = get_srcline(node->map->dso,
> > > > > > > > + map__rip_2objdump(node->map, node->ip),
> > > > > > > > + node->sym, true, false);
> > > > > > > > + enum match_result ret = match_chain_strings(left, right);
> > > > > > >
> > > > > > > I think we need to check inlined srcline as well. There might
> > > > > > > be a
> > > > > > > case that two samples have different addresses (and from
> > > > > > > different
> > > > > > > callchains) but happens to be mapped to a same srcline IMHO.
> > > > > >
> > > > > > I think I'm missing something, but isn't this what this function
> > > > > > provides?
> > > > > > The function above is now being used by the match_chain_inliner
> > > > > > function
> > > > > > below.
> > > > > >
> > > > > > Ah, or do you mean for code such as this:
> > > > > >
> > > > > > ~~~~~
> > > > > > inline_func_1(); inline_func_2();
> > > > > > ~~~~~
> > > > > >
> > > > > > Here, both branches could be inlined into the same line and the
> > > > > > same
> > > > > > issue
> > > > > > would occur, i.e. different branches get collapsed into the first
> > > > > > match
> > > > > > for
> > > > > > the given srcline?
> > > > > >
> > > > > > Hm yes, this should be fixed too.
> > > > >
> > > > > OK.
> > > > >
> > > > > > But, quite frankly, I think it just shows more and more that the
> > > > > > current
> > > > > > inliner support is really fragile and leads to lots of issues
> > > > > > throughout
> > > > > > the code base as the inlined frames are different from non-inlined
> > > > > > frames, but should most of the same be handled just like them.
> > > > > >
> > > > > > So, maybe it's time to once more think about going back to my
> > > > > > initial
> > > > > > approach: Make inlined frames code-wise equal to non-inlined
> > > > > > frames,
> > > > > > i.e.
> > > > > > instead of requesting the inlined frames within match_chain, do it
> > > > > > outside
> > > > > > and create callchain_node/callchain_cursor instances (not sure
> > > > > > which
> > > > > > one
> > > > > > right now) for the inlined frames too.
> > > > > >
> > > > > > This way, we should be able to centrally add support for inlined
> > > > > > frames
> > > > > > and
> > > > > > all areas will benefit from it:
> > > > > >
> > > > > > - aggregation by srcline/function will magically work
> > > > > > - all browsers will automatically display them, i.e. no longer
> > > > > > need to
> > > > > > duplicate the code for inliner support in perf script, perf report
> > > > > > tui/
> > > > > > stdio/...
> > > > > > - we can easily support --inline in other tools, like `perf trace
> > > > > > --call-
> > > > > > graph`
> > > > > >
> > > > > > So before I invest more time trying to massage match_chain to
> > > > > > behave
> > > > > > well
> > > > > > for inline nodes, can I get some feedback on the above?
> > > > >
> > > > > Fair enough. I agree that it'd be better adding them as separate
> > > > > callchain nodes when resolving callchains.
> > > >
> > > > Can you, or anyone else more involved with the current callchain code,
> > > > guide me a bit?
> > > >
> > > > My previous attempt at doing this can be seen here:
> > > > https://github.com/milianw/linux/commit/
> > > > 71d031c9d679bfb4a4044226e8903dd80ea601b3
> > > >
> > > > There are some issues with that. Most of it boils down to the question
> > > > of
> > > > how to feed an inlined frame into a callchain_cursor_node. The latter
> > > > contains a symbol* e.g. and users of that class currently rely on
> > > > using
> > > > the IP to find e.g. the corresponding srcline.
> > > >
> > > > From what I can see, we either have to hack inline nodes in there, cf.
> > > > my
> > > > original approach in the github repo above. Or, better, we would have
> > > > to
> > > > (heavily?) refactor the callchain_cursor_node struct and the code
> > > > depending on it. What I have in mind would be something like adding
> > > > two
> > > > members to this struct:
> > > >
> > > > const char* funcname;
> > > > const char* srcline;
> > > >
> > > > For non-inlined frames, the funcname aliases the `symbol->name` value,
> > > > and
> > > > the srcline is queried as-needed. For inlined frames the values from
> > > > the
> > > > inlined node struct are used. The inlined frames for a given code
> > > > location would all share the same symbol and ip.
> > > >
> > > > Would that be OK as a path forward?
> > >
> > > I think you'd be better adding (fake) dso and sym to keep the inline
> > > information. The fake dso can be paired with the original dso and
> > > maintain a tree of (inlined) symbols. You may need a fake map to
> > > point the fake dso then. It seems a bit compilcated but that way the
> > > code will be more consistent and easier to handle (e.g. for caching
> > > and/or deletion) IMHO.
> >
> > Can you expand on this please? How would that solve the problem of finding
> > a function name or srcline for a given inline frame?
> >
> > I.e.: the function name is, currently, part of the sym. So the fake
> > dso/map
> > would contain an internal, fake, string table which fake symbols could
> > leverage for the function name?
> >
> > Sounds like doable, but also sounds like *a lot* of work. And I don't see
> > how that would solve the srcline situation: That one is queried on-demand
> > based on the IP of a frame. I would say that inline frames should keep
> > the IP of the first non-inlined frame. But that would make it impossible
> > to find the srcline for the N'th inlined frame... Am I missing something?
>
> I agree that srcline info can be kept in callchain cursor nodes, but I
> still think function name should be in (fake) symbols. Sharing a
> symbol for all inlined frames would not work for the children mode
> IMHO.

I'm running into a bit of trouble here. I hoped to be able to store the
inlined symbol in the DSO to reuse it for other inlined entries that use the
same function. I also hope that this will then take care of the deletion of
the fake symbols, once the dso is freed.

To do this, I thought I could use dso__find_symbol_by_name and, if nothing was
found, I create the new fake symbol by symbol__new and insert it via
dso__insert_symbol. Apparently, I also need to update the sorted lookup table,
so I call dso__sort_by_name, but that then leads to crashes:

==22675== Invalid write of size 8
==22675== at 0x4D89FC: rb_link_node (rbtree.h:82)
==22675== by 0x4D89FC: symbols__insert_by_name (symbol.c:387)
==22675== by 0x4D89FC: symbols__sort_by_name (symbol.c:398)
==22675== by 0x4D89FC: dso__sort_by_name (symbol.c:512)
==22675== by 0x4EB704: unwind_entry (machine.c:2061)
==22675== by 0x55BCF7: entry (unwind-libunwind-local.c:600)
==22675== by 0x55BCF7: get_entries (unwind-libunwind-local.c:723)
==22675== by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
==22675== by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
==22675== by 0x5196F9: hist_entry_iter__add (hist.c:1039)
==22675== by 0x448044: process_sample_event (builtin-report.c:204)
==22675== by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
==22675== by 0x4F61DD: machines__deliver_event (session.c:1248)
==22675== by 0x4F66D3: perf_session__deliver_event (session.c:1310)
==22675== by 0x4F66D3: ordered_events__deliver_event (session.c:119)
==22675== by 0x4F9CD2: __ordered_events__flush (ordered-events.c:210)
==22675== by 0x4F9CD2: ordered_events__flush.part.3 (ordered-events.c:277)
==22675== by 0x4F6DEC: perf_session__process_user_event (session.c:1349)
==22675== by 0x4F6DEC: perf_session__process_event (session.c:1475)
==22675== by 0x4F8C5C: __perf_session__process_events (session.c:1867)
==22675== by 0x4F8C5C: perf_session__process_events (session.c:1921)
==22675== Address 0xd1fae38 is 24 bytes before a block of size 54 alloc'd
==22675== at 0x4C2CF35: calloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==22675== by 0x4D80F0: symbol__new (symbol.c:240)
==22675== by 0x54750E: dso__load_sym (symbol-elf.c:1121)
==22675== by 0x4D9D3E: dso__load (symbol.c:1535)
==22675== by 0x4F262B: map__load (map.c:292)
==22675== by 0x4F262B: map__find_symbol (map.c:335)
==22675== by 0x4B3B55: thread__find_addr_location (event.c:1458)
==22675== by 0x55BC70: entry (unwind-libunwind-local.c:588)
==22675== by 0x55BC70: get_entries (unwind-libunwind-local.c:723)
==22675== by 0x55BE01: _unwind__get_entries (unwind-libunwind-local.c:745)
==22675== by 0x4E8B20: sample__resolve_callchain (callchain.c:1016)
==22675== by 0x5196F9: hist_entry_iter__add (hist.c:1039)
==22675== by 0x448044: process_sample_event (builtin-report.c:204)
==22675== by 0x4F61DD: perf_evlist__deliver_sample (session.c:1211)
==22675== by 0x4F61DD: machines__deliver_event (session.c:1248)
==22675==

Here's the patch that I tried to play around with this concept:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a98f55a91cb2..4d1b6e58da00 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2037,6 +2037,38 @@ static int unwind_entry(struct unwind_entry *entry,
void *arg)

if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
+
+ if (symbol_conf.inline_name && entry->map) {
+ u64 addr = map__rip_2objdump(entry->map, entry->ip);
+ struct inline_node *inline_node;
+ inline_node = dso__parse_addr_inlines(entry->map->dso, addr);
+ if (inline_node) {
+ struct inline_list *ilist;
+ list_for_each_entry(ilist, &inline_node->val, list) {
+ struct symbol *sym;
+ sym = dso__find_symbol_by_name(entry->map-
>dso,
+ MAP__FUNCTION,
+ ilist-
>funcname);
+ if (!sym) {
+ fprintf(stderr, "create new entry\n");
+ sym = symbol__new(entry->sym->start,
+ entry->sym->end,
+ entry->sym->binding,
+ ilist->funcname);
+ dso__insert_symbol(entry->map->dso,
+ MAP__FUNCTION,
+ sym);
+ dso__sort_by_name(entry->map->dso,
+ MAP__FUNCTION);
+ fprintf(stderr, "%p | %p | %s | %s\n",
sym,
+
dso__find_symbol_by_name(entry->map->dso, MAP__FUNCTION, ilist->funcname),
+ sym->name, ilist->funcname);
+ }
+ fprintf(stderr, "want to add inline: %s %p\n",
ilist->funcname, sym);
+ }
+ inline_node__delete(inline_node);
+ }
+ }
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
false, NULL, 0, 0);
~~~~~~~~~~~~~~~

Any idea what I'm doing wrong here? Can I not insert into the sorted list?
I.e. is it not safe to sort multiple times? Is this the wrong approach in
general?

Also, any suggestion as to what I'd use for symbol start/end for the inlined
fake symbols?

Thanks

> > > Also, for the children mode, callchain nodes should have enough
> > > information to create hist entries (but I'm not sure how to apply
> > > self periods for those inlined entries).
> >
> > I'm not sure I'm following here either, but once inlined frames are normal
> > callchain nodes, all browsers will simply support them out of the box, no?
> > All of the aggregation and browsing features should just workâ. We'd only
> > need to patch the browsers for special usecases, like when we want to
> > change the visuals to make it clear that a given frame was actually
> > inlined.
>
> Yes, once inlined frames converted to callchain cursor nodes, it
> should work out of the box. But for children mode, it needs a symbol
> to construct a hist entry from a callchain cursor node. Please see
> fill_callchain_info(). If you add the "(inline)" to the (fake) symbol
> name, you don't even need to patch the browsers IMHO.

Good idea, I'll do that once I get it to work (see above)

Cheers

--
Milian Wolff | milian.wolff@xxxxxxxx | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

Attachment: smime.p7s
Description: S/MIME cryptographic signature