Re: [RFC] perf record: missing buildid for callstack modules
From: Ingo Molnar
Date: Tue Jan 12 2016 - 07:18:17 EST
* Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Jan 12, 2016 at 11:39:43AM +0100, Ingo Molnar wrote:
> > > Does the kernel even know about the buildid crap? AFAIK the binfmt stuff doesn't
> > > know or care about things like that. Heck, we support binfmts that do not even
> > > have a buildid.
> >
> > The kernel's exec() code does not care about the past, it will execute whatever is
> > fit to execute right now.
> >
> > But perf tooling cares very much: it can lead to subtle bugs and bad data if we
> > display a profile with the wrong DSO or binary. 'Bad' profiles resulting out of
> > binary mismatch can be very convincing and can send developers down the wrong path
> > for hours. I'd expect my tooling to not do that.
>
> Well, it really is rather a rare case, replacing binaries you're
> profiling. Sure, if it happens (by accident or otherwise) it can be a
> pain, but the cost of fixing this 'problem' is huge.
But isn't this the common case for developers, who rebuild their binaries all the
time, while profiling them? Looking at the wrong profile without having an
indication that it's wrong is a problem.
> > Path names alone (the thing that exec() cares about) are not unique enough to
> > identify the binary that was profiled. So we need a content hash - hence the
> > build-ID.
> >
> > Can you suggest a better solution than a build-time calculated content hash?
>
> Not really, but the current 'solution' is a massive pain. The result is that
> perf-record needs to do a full scan of the recorded data after completion and
> look for buildids across the system.
>
> On my system that pass takes longer than the actual workload (of building a
> kernel). Furthermore, the resulting data is useless for me.
Hm, that's a powerful performance argument. Why is it so slow? I'd assume that by
default we only need to save the build-ID itself per object - which is like 20
bytes?
> > As for binary formats that suck and don't allow for a content hash: we do our
> > best, but of course the risk of data mismatch is there. We could perhaps cache
> > the binary inode's mtime field to at least produce a 'profile data is older
> > than binary/DSO modification date!' warning. (Which check won't catch all
> > cases, like cross-system profiling data matches.)
>
> So my problem with the kernel side thing is that I fear it will, again, be a
> partial solution, and we'll still end up scanning the perf-record output, ie.
> nothing better than we are now.
>
> Sure, maybe we can have binfmt_elf read the buildid and cache it someplace,
> maybe we can even have the other binfmt thingies do something similar (at small
> cost, we obviously cannot compute hashes over files at exec() time, that would
> upset people).
>
> But what do we do for DSOs, does dlopen() ever end up in the binfmt code? I
> would think not, I would fully expect the dynamic linker to just mmap() the
> relevant bits and be done with it.
>
> And we cannot, at mmap() time, 'assume' the file is ELF and try prodding into it
> to find a buildid or whatnot.
>
> And all for some weird corner case.
So could we perhaps just switch the whole thing over to be mtime based: mtime is
pretty indicative of whether a binary is the right one or not.
And mtime could be checked at perf report time, not at perf record time: we'd only
have to check whether the mtime of the object we read at perf report time is newer
than the mtime of the perf.data (the creation of the profile).
This does not solve rare corner cases like cross-system profiling, but I think the
common case should not be burdened with the overhead of a rare case.
Thanks,
Ingo