Re: I.5 - Mmaped count

From: stephane eranian
Date: Mon Jun 22 2009 - 08:25:43 EST

On Mon, Jun 22, 2009 at 1:52 PM, Ingo Molnar<mingo@xxxxxxx> wrote:
>> 5/ Mmaped count
>> It is possible to read counts directly from user space for
>> self-monitoring threads. This leverages a HW capability present on
>> some processors. On X86, this is possible via RDPMC.
>> The full 64-bit count is constructed by combining the hardware
>> value extracted with an assembly instruction and a base value made
>> available thru the mmap. There is an atomic generation count
>> available to deal with the race condition.
>> I believe there is a problem with this approach given that the PMU
>> is shared and that events can be multiplexed. That means that even
>> though you are self-monitoring, events get replaced on the PMU.
>> The assembly instruction is unaware of that, it reads a register
>> not an event.
>> On x86, assume event A is hosted in counter 0, thus you need
>> RDPMC(0) to extract the count. But then, the event is replaced by
>> another one which reuses counter 0. At the user level, you will
>> still use RDPMC(0) but it will read the HW value from a different
>> event and combine it with a base count from another one.
>> To avoid this, you need to pin the event so it stays in the PMU at
>> all times. Now, here is something unclear to me. Pinning does not
>> mean stay in the SAME register, it means the event stays on the
>> PMU but it can possibly change register. To prevent that, I
>> believe you need to also set exclusive so that no other group can
>> be scheduled, and thus possibly use the same counter.
>> Looks like this is the only way you can make this actually work.
>> Not setting pinned+exclusive, is another pitfall in which many
>> people will fall into.
> Â do {
> Â Â seq = pc->lock;
> Â Â barrier()
> Â Â if (pc->index) {
> Â Â Â count = pmc_read(pc->index - 1);
> Â Â Â count += pc->offset;
> Â Â } else
> Â Â Â goto regular_read;
> Â Â barrier();
> Â } while (pc->lock != seq);
> We don't see the hole you are referring to. The sequence lock
> ensures you get a consistent view.
Let's take an example, with two groups, one event in each group.
Both events scheduled on counter0, i.e,, rdpmc(0). The 2 groups
are multiplexed, one each tick. The user gets 2 file descriptors
and thus two mmap'ed pages.

Suppose the user wants to read, using the above loop, the value of the
event in the first group BUT it's the 2nd group that is currently active
and loaded on counter0, i.e., rdpmc(0) returns the value of the 2nd event.

Unless you tell me that pc->index is marked invalid (0) when the
event is not scheduled. I don't see how you can avoid reading
the wrong value. I am assuming that is the event is not scheduled
lock remains constant.

Assuming the event is active when you enter the loop and you
read a value. How to get the timing information to scale the
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at