Re: [RFC][PATCH 00/11] perf pmu interface -v2

From: Paul Mundt
Date: Thu Jul 01 2010 - 22:58:06 EST


On Thu, Jul 01, 2010 at 05:39:53PM +0200, Peter Zijlstra wrote:
> On Thu, 2010-07-01 at 16:31 +0100, MattFleming wrote:
> > On Thu, Jul 01, 2010 at 05:02:35PM +0200, Peter Zijlstra wrote:
> > >
> > > Matt, you said it broke SH completely, but did you try perf stat? perf
> > > record is not supposed to work on SH due to the hardware not having an
> > > overflow interrupt.
> >
> > perf record does work to some degree. It definitely worked before
> > applying your changes but not after. I admit I haven't really read the
> > perf event code, but Paul will know.
>
> Ok, let me look at that again.
>
Any perf record functionality observed is entirely coincidental and not
by design. It was something I planned to revisit, but most of what we
have right now is only geared at the one-shot perf stat case.

> > > Which made me think, what on SH guarantees we update the counter often
> > > enough not to suffer from counter wrap? Would it make sense to make the
> > > SH code hook into their arch tick handler and update the counters from
> > > there?
> >
> > This was the way that the oprofile code used to work. Paul and I were
> > talking about using a hrtimer to sample performance counters as
> > opposed to piggy-backing on the tick handler.
>
> Ah, for sampling for sure, simply group a software perf event and a
> hardware perf event together and use PERF_SAMPLE_READ.
>
> But suppose its a non sampling counter, how do you avoid overflows of
> the hardware register?

At the moment it's not an issue since we have big enough counters that
overflows don't really happen, especially if we're primarily using them
for one-shot measuring.

SH-4A style counters behave in such a fashion that we have 2 general
purpose counters, and 2 counters for measuring bus transactions. These
bus counters can optionally be disabled and used in a chained mode to
provide the general purpose counters a 64-bit counter (the actual
validity in the upper half of the chained counter varies depending on the
CPUs, but all of them can do at least 48-bits when chained).

Each counter has overflow detection and asserts an overflow bit, but
there are no exceptions associated with this, so it's something that we
would have to tie in to the tick or defer to a bottom half handler in the
non-sampling case (or simply test on every read, and accept some degree
of accuracy loss). Any perf record functionality we implement with this
sort of scheme is only going to provide ballpark figures anyways, so it's
certainly within the parameters of acceptable loss in exchange for
increased functionality.

Different CPUs also implement their overflows differently, some will roll
and resume counting, but most simply stop until the overflow bit is
cleared.

My main plan was to build on top of the multi-pmu stuff, unchain the
counters, and expose the bus counters with their own event map as a
separate PMU instance. All of the other handling logic can pretty much be
reused directly, but it does mean that we need to be a bit smarter about
overflow detection/handling. Sampling and so on is also on the TODO list,
but is as of yet still not supported.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/