RE: Running an Ivy Bridge cpu at fixed frequency

From: David Laight
Date: Thu Dec 05 2019 - 10:54:01 EST


From: Peter Zijlstra
> Sent: 05 December 2019 09:46
> As Andy already wrote, perf is really good for this.
>
> Find attached, it probably is less shiny than what Andy handed you, but
> contains all the bits required to frob something.

You are in a maze of incomplete documentation all disjoint.

The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
how many counters each cpu type has, but doesn't even contain a reference
to how they are incremented.
I guess there are some processor-specific MSR for that.

perf_event_open(2) tells you a few things, but doesn't actually what anything is.
It contains all but the last 'if' clause of this function, without really saying
what any of it does - or why you might do it this way.

static inline u64 mmap_read_self(void *addr)
{
struct perf_event_mmap_page *pc = addr;
u32 seq, idx, time_mult = 0, time_shift = 0, width = 0;
u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
s64 pmc = 0;

do {
seq = pc->lock;
barrier();

enabled = pc->time_enabled;
running = pc->time_running;

if (pc->cap_user_time && enabled != running) {
cyc = rdtsc();
time_mult = pc->time_mult;
time_shift = pc->time_shift;
time_offset = pc->time_offset;
}

idx = pc->index;
count = pc->offset;
if (pc->cap_user_rdpmc && idx) {
width = pc->pmc_width;
pmc = rdpmc(idx - 1);
}

barrier();
} while (pc->lock != seq);

if (idx) {
pmc <<= 64 - width;
pmc >>= 64 - width; /* shift right signed */
count += pmc;
}

if (enabled != running) {
u64 quot, rem;

quot = (cyc >> time_shift);
rem = cyc & ((1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);

enabled += delta;
if (idx)
running += delta;

quot = count / running;
rem = count % running;
count = quot * enabled + (rem * enabled) / running;
}

return count;
}

AFAICT:
1) The last clause is scaling the count up to allow for time when the hardware counter
couldn't be allocated.
I'm not convinced that is useful, better to ignore the entire measurement.
Half this got deleted from the man page, leaving strange 'set but unused' variables.

2) The hardware counters are disabled while the process is asleep.
On wake a different pmc counter might be used (maybe on a different cpu).
The new cpu might not even have a counter available.

3) If you don't want to scale up for missing periods it is probably enough to do:
do {
seq = pc->offset;
barrier();
idx = pc->index;
if (!index)
return -1;
count = pc->offset + rdpmc(idx - 1);
} while (seq != pc->seq);
return (unsigned int)count;

Not tried it yet :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)