Re: [PATCH 4/4] trace: profile all if conditionals

From: Steven Rostedt
Date: Sun Nov 23 2008 - 15:33:38 EST



On Sun, 23 Nov 2008, Andi Kleen wrote:

> On Sun, Nov 23, 2008 at 02:56:42PM -0500, Steven Rostedt wrote:
>
> You snipped my earlier suggestion? Can't you just use kernel gcov
> for this? Frankly it's output is infinitely more useful than
> the one from your patch. It also addresses Andrew's suggestion
> of profiling other control flow constructs.

I'll have to read up more on gcov.

>
> I know it's not ftrace, but not everything is bad just because it's
> not seen through the ftrace spectacles @)

Actually, the only thing the full branch profiler shares with ftrace, is
the directory. It really does not use any of the function tracing or
ring buffer facilities. The "ftrace" name is just a catch all for any
type of tracing or profiling that I do ;-)

Notice, the subject is "trace:" not "ftrace:". This is because it did
not belong to the true ftrace facility. It really should have been
"profiling:" but I was tired when I wrote it.

>
> > On Sun, 23 Nov 2008, Andi Kleen wrote:
> > > Steven Rostedt <rostedt@xxxxxxxxxxx> writes:
> >
> > > > This adds a significant amount of overhead and should only be used
> > > > by those analyzing their system.
> > >
> > > Often this can be also done using CPU performance counters. Might
> > > be a cheaper option.
> >
> > I'd love to add an option that could hook into any arch with HW support
> > for this. We could dump out the same information, but just a different way
> > to gather it. But I'm still ignorant to the use of CPU performance
> > counters and how to find which branch matches which if.
>
> The theory is quite simple. Typically there are events for
> "taken branches" and others for "non taken". So you set up
> two counters using the existing oprofile support and collect
> the samples. Then combine these two sample streams.
>
> [Sometimes you have to also synthesize these
> events because CPUs like to count predicted and mispredicted
> (in the CPU sense) differently, but that's also quite simple
> (on x86/core these can be all specified in the unit mask for
> the same event)]
>
> The sampling will be statistical (not every branch counted),
> but that's ok because only branches that are executed a significant
> time are interesting anyways.
>
> The only problem is you have to map back to source code lines, which
> can be done in user space based on the oprofile output and some
> addr2line or similar hacks. oprofile can also do this,
> although it gives this information only indirectly so a custom
> tool might be easier.

My work evolves around not adding any userspace tool that is not already
supported by busybox. I'm not against anyone else doing this work. It's
just that I'm not interested in such. In otherwords, I rather have someone
else do that work.

>
> Note it doesn't even need new kernel code, assuming
> the architecture already has a working full oprofile implementation.
>
> The main advantage over gcov would be lower runtime overhead,
> although gcov is giving the better output (and is already working
> too)


This all sounds great, and perhaps someone might decide to do this. The
full branch tracer I added, I did in a few hours and tested on both x86
and PPC. That was because I was waiting on output from a test running on
another box that was taking a couple of hours, and I got bored.

I'm not against any of the suggestions you make. I'm just waiting for
someone else to do it ;-)

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/