Re: [RFC PATCH 6/7] perf, x86: large PEBS interrupt threshold

From: Peter Zijlstra
Date: Wed May 28 2014 - 11:02:28 EST


On Wed, May 28, 2014 at 02:54:25PM +0200, Stephane Eranian wrote:
> On Wed, May 28, 2014 at 10:10 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Wed, May 28, 2014 at 02:18:09PM +0800, Yan, Zheng wrote:
> >> PEBS always had the capability to log samples to its buffers without
> >> an interrupt. Traditionally perf has not used this but always set the
> >> PEBS threshold to one.
> >>
> >> For the common cases we still need to use the PMI because the PEBS
> >> hardware has various limitations. The biggest one is that it can not
> >> supply a callgraph. It also requires setting a fixed period, as the
> >> hardware does not support adaptive period. Another issue is that it
> >> cannot supply a time stamp and some other options.
> >
> > So the reason I've never done this is because Intel has never fully
> > explained the demuxing of pebs events.
> >
> > In particular, the 0x90 offset (IA32_PERF_GLOBAL_STATUS). Intel once
> > confirmed to me that that is a direct copy of the similarly named MSR at
> > the time of the PEBS assist.
> >
> > This is a problem, since if multiple counters overflow multiple bits
> > will be set and its (afaict) ambiguous which event is for which counter.
> >
> I am not sure how having only one entry in the PEBS buffer solves this.
> I think PEBS will create only one entry if multiple counters overflow
> simultaneously.

For the not exact simultaneous events it narrows the window in which
we can have another event overflow and raise the bit because it will
immediately raise the PMI and disable the PMU.

Remember, that status bit gets raised when the counter overflows, but
the PEBS assist, and therefore the hardware reset, can take a long while
to actually happen. So there's fairly large windows here there's
multiple bits set.

And if you have the auto-refresh; does that clear the status bit again?
Supposing it does (its the sane thing to do), you can actually have 3
bits set, one for an event that hasn't even had a pebs assist yet.

So while the problem still exists for a single event, its much worse if
you just let the thing run.

> That OVFL_STATUS bitmask will have multiple bits
> set. I understand the problem in perf_events because you need to
> assign a sample to an event and not all events may record the same
> info in the sampling buffer.

Right, so you raise the issue that a pebs assist trigger of two events
on the exact cycle will only create one record with two bits set, that's
worse because its indistinguishable from the case where there's two
separate but near events recorded one of which also has two bits set
(because the second counter did overflow but no assist triggered yet).

So here we have to distinct scenarios in which multiple bits are set and
no way to disambiguate.

All in all, its a complete and utter trainwreck, and the worst part is
that I've raised this issue multiple times, starting some 5 years ago,
and nothing has happened afaik.

Attachment: pgpiSoqeWffCs.pgp
Description: PGP signature