Re: [discuss] Re: [Perfctr-devel] Re: Enabling RDPMC in user space by default

From: Andi Kleen
Date: Tue Nov 29 2005 - 17:51:23 EST


On Tue, Nov 29, 2005 at 02:19:15PM -0800, Stephane Eranian wrote:
> Andi,
>
> On Tue, Nov 29, 2005 at 10:52:07PM +0100, Andi Kleen wrote:
> > On Tue, Nov 29, 2005 at 01:43:11PM -0800, Nicholas Miell wrote:
> > > On Tue, 2005-11-29 at 19:13 +0100, Andi Kleen wrote:
> > > > > Where did you see that PMC0 (PERSEL0/PERFCTR0) can only be programmed
> > > > > to count cpu cycles (i.e. cpu_clk_unhalted)? As far as I can tell from
> > > > > the documentation, the 4 counters are symetrical and can measure
> > > > > any event that the processor offers.
> > > >
> > > > Linux NMI watchdog does that.
> > > >
> > > > All other perfctr users are supposed to keep their fingers away
> > > > from the watchdog (it looks like oprofile doesn't but not for much
> > > > longer ...)
> > >
> > > Why? Hardcoding PMC 0 to be a cycle counter seems to be a waste of a
> > > perfectly usable performance counter. What if I want to profile four
> > > things, none of them requiring a cycle count?
> >
>
> On AMD you only have 4 counters. That's not a lot for some measurements.

Disabling the NMI watchdog for that is out of question. It's a important
debugging device and without it kernel bug reports are much worse.
It increased the quality of x86-64 bug reports over the years
considerably and I'm unwilling to give that up.

I didn't realize oprofile did this so far, but I plan to definitely
fix this.

> The other thing is that PERCTR0 is not like the TSC. It can count cycles
> but it does only implement 47bits. At a high clock rate, this can wrap
> around fairly rapidly. It all depends on what is the intended usage model.

TSC also doesn't count cycles in many circumstances (different frequency
depending on P states or not synchronized over CPUs, even running
at completely different frequencies etc.)

> Suppose you would have a "stable" performance monitoring interface.

Then you don't use TSC because it's not stable (or conversely
too stable for many performance measurements because it doesn't
follow the P states)

> One could just use that interface to measure time only when needed.

Good debugging infrastructure has priority imho - and NMI watchdog
is important. You will need to live with three counters.

This means there is one alternative - some of the newer chipsets
have external watchdogs that could be also used (using the ACPI WDOG
table). If someone writes a nice NMI driver for these then on system
with working WDOG it could replace the perfctr based timeout and free
the perfctr. That would need some code to allocate and deallocate
perfctrs though.

The older IOAPIC watchdog is no alternative because it runs too often (at HZ)
and has too much overhead.

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/