Re: Running an Ivy Bridge cpu at fixed frequency

From: Peter Zijlstra
Date: Fri Dec 06 2019 - 05:15:47 EST


On Thu, Dec 05, 2019 at 03:53:55PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 05 December 2019 09:46
> > As Andy already wrote, perf is really good for this.
> >
> > Find attached, it probably is less shiny than what Andy handed you, but
> > contains all the bits required to frob something.
>
> You are in a maze of incomplete documentation all disjoint.

I'm sure..

> The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
> how many counters each cpu type has, but doesn't even contain a reference
> to how they are incremented.

There's book 3, chapter 18, performance monitoring overview, that should
explain how the counters work, and chapter 19 that lists many of the
available events.

TL;DR, they're (48bit) signed counters that increment and raise an
interrupt when the sign flips. This means we set them to '-period' and
then upon read (either early or on interrupt) compute the delta and
accumulate elsewhere.

> perf_event_open(2) tells you a few things, but doesn't actually what anything is.
> It contains all but the last 'if' clause of this function, without really saying
> what any of it does - or why you might do it this way.

I don't actually know what's in that manpage. But it really shouldn't be
too hard to understand.

It's a seqcount protected set of value, there's the RDPMC counter index,
and the counter offset. If the idx!=0 it means the counter is actually
programmed and we must RDPMC, the result of which we must add to the
offset.

The whole counter scaling crud is just that, crud you can mostly forget
about if you want to quickly hack something together. See
mmap_read_pinned() for the simplified (and much faster version) that
ignores all that.


> AFAICT:
> 1) The last clause is scaling the count up to allow for time when the hardware counter
> couldn't be allocated.
> I'm not convinced that is useful, better to ignore the entire measurement.
> Half this got deleted from the man page, leaving strange 'set but unused' variables.

Depending on the usecase, sure. I don't mave use for it either. I know
other people find it useful.

> 2) The hardware counters are disabled while the process is asleep.
> On wake a different pmc counter might be used (maybe on a different cpu).
> The new cpu might not even have a counter available.

Right, but if this is all you're running that is unlikely to happen.

> 3) If you don't want to scale up for missing periods it is probably enough to do:
> do {
> seq = pc->offset;
> barrier();
> idx = pc->index;
> if (!index)
> return -1;
> count = pc->offset + rdpmc(idx - 1);
> } while (seq != pc->seq);
> return (unsigned int)count;

You still need to do the rdpmc sign extent crud, but see
mmap_read_pinned() that does just about that.

As the name suggests it relies on using perf_event_attr::pinned = 1.