Re: [RFC] perf record: missing buildid for callstack modules

From: Arnaldo Carvalho de Melo
Date: Wed Jan 13 2016 - 10:27:52 EST

Em Wed, Jan 13, 2016 at 10:57:38AM +0100, Ingo Molnar escreveu:
> * Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> > Em Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar escreveu:
> > > But perf tooling cares very much: it can lead to subtle bugs and bad data if we
> > > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of
> > > binary mismatch can be very convincing and can send developers down the wrong path
> > > for hours. I'd expect my tooling to not do that.

> > > Path names alone (the thing that exec() cares about) are not unique enough to
> > > identify the binary that was profiled. So we need a content hash - hence the
> > > build-ID.

> > > Can you suggest a better solution than a build-time calculated content hash?

> > > As for binary formats that suck and don't allow for a content hash: we do our
> > > best, but of course the risk of data mismatch is there. We could perhaps cache the
> > > binary inode's mtime field to at least produce a 'profile data is older than
> > > binary/DSO modification date!' warning. (Which check won't catch all cases, like
> > > cross-system profiling data matches.)

> > So, we could think of this as: binary formats that want to aid
> > observability tools to:

> > 1) Detect mismatches in contents for DSOs present at recording time to
> > those to be used at analysis time.

> > 2) Find symtabs, DSO binary contents, CFI tables, present in the DSO
> > where samples were taken.

> > Using mtime, as suggested in other messages will help with #1, but not
> > with #2.
> But but ... why is #2 a problem with mtime? If we have an out of date record in
> the, then the is uninteresting in 99% of the usecases! It's
> out of date, most likely because the binary the developer is working on got
> rebuilt, or the system got upgraded - in both cases the developer does not care
> about the old records anymore...

Oh, with mtime we'll be able to, with some effort, most of the time,
find the right symtab/DSO contents, it is not as good as the
content-based build-id, but it improves the current situation, for, to
reuse an euphemism, 99% of the cases ;-)

It is just a pity that what would arguably be the last step to make
content-based DSO identifiers a first class citizen, already available
mostly everywhere, i.e. in ELF binaries will have to wait a bit more.

With PeterZ's change to make the filename length part of the
PERF_RECORD_MMAP3 we at least leave the door open to including that
cookie without having to introduce PERF_RECORD_MMAP4 :-)

> What matters is #1, to detect mismatches, to be a reliable tool. Once we've
> detected that, we can inform the user and our job is mostly done.

Well, we could tell the user to install the package with that symtab, in fedora
it is called 'foo-debuginfo' and can be installed via, for instance (example
taken from a gdb post somewhere):

[root@zoo ~]# dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/3d/f5385c6be529423a8ae3dd39a3deb9425201cc
Using metadata from Wed Jan 13 12:12:16 2016 (0:02:56 hours old)
Dependencies resolved.
Package Arch Version Repository Size
glibc-debuginfo x86_64 2.20-8.fc21 updates-debuginfo 9.2 M

Transaction Summary
Install 1 Package

Total download size: 9.2 M
Installed size: 58 M
Is this ok [y/N]:


I.e. infrastructure is in place to get this "time machine" you mention below, is somewhat
in place to get a symtab for a DSO, be it the latest version of some versions ago.

Which could be useful to understand the behaviour of code in production while an
update can't be applied (not vetted by powers that be, whatever reason).

> But reliable != perfect time machine. Really, #2 is a second, third order concern
> that should never cause slowdowns on the magnitude that Peter is complaining
> about!

Sure, mtime will improve peterz's usecase, no question about it.

> I realize that there might be special workflows (such as system-wide monitoring)
> where collecting at recording time might be useful, but those are not the common
> case - and they should not slow down the common case.

Sure, no question about this.

> > Checking for inefficiencies in the current approach of
> > right-after-recording post-processing looking for PERF_RECORD_MMAPs,
> > Adrian suggested something here, also disabling the saving into
> > ~/.debug/ will help, collecting numbers would be great.
> I think Peter mentioned a number: the kernel build time almost _doubles_ with
> this. That's clearly unacceptable.

And for him, most of the time, not a problem, in fact we could argue that he
has not a problem, Stephane seems to have ;-)

> > But the mtime thing also requires traversing the whole
> > contents looking for those paths in PERF_RECORD_MMAP records.

> But why? Why cannot we do it at perf report time, when we will parse them anyway?

Hey, I was talking only about 'record' time.

And at record, we don'have to traverse the whole contents as soon as
we add a new PERF_RECORD_MMAP3, just not with the initial motivation for it (a
content-based cookie), but instead the DSOs's mtime :-)

- Arnaldo