Re: [RFC PATCH 2/3 v3] perf: Implement Nehalem uncore pmu

From: Stephane Eranian
Date: Thu Dec 09 2010 - 18:46:47 EST


Hi,

So I have tested this patch a bit on WSM and as I expected there
are issues with sampling.

When HT is on, both siblings CPUs get the interrupt. The HW does not
allow you to only point interrupts to a single HT thread (CPU).

I did verify that indeed both threads get the interrupt and that you have a
race condition. Both sibling CPUs stop uncore, get the status. They may get
the same overflow status. Both will pass the uncore->active_mask because
it's shared among siblings cores. Thus, you have a race for the whole
interrupt handler execution.

You need some serialization in there. But the patch does not address this.
The problem is different from the back-to-back interrupt issue that
Don worked on.
The per-cpu marked/handled trick cannot work to avoid this problem.

You cannot simply say "the lowest indexed" CPU of a sibling pair
handles the interrupt
because you don't know if this in an uncore intr, core interrupt or
something else. You
need to check. That means each HT thread needs to check uncore
ovfl_status. IF the
status is zero, then return. Otherwise, you need to do a 2nd level
check before you can
execute the handler. You need to know if the sibling CPU has already
"consumed" that
interrupt.

I think you need some sort of generation counter per physical core and
per HT thread.
On interrupt, you could do something along the line of:
if (mycpu->intr_count == mysibling->intr_count) {
then mycpu->intr_count++
execute intr_handler()
} else {
mycpu->intr_count++
return;
}
Of course, the above needs some atomicity and ad locking (but I don't
think you can
use locks in NMI context).

This makes me wonder if vectoring uncore to NMI is really needed,
given you cannot
correlated to an IP, incl. a kernel IP. If we were to vector to a
dedicated (lower prio)
vector, then we could use the trick of saying the lowest indexed CPU in a pair
handles the interrupt (because we would already know this is an uncore
interrupt).
This would be much simpler. Price: not samples in kernel's critical
section. But those
are useless anyway with uncore events.

- uncore_get_status().
PERF_GLOBAL_STATUS contains more than overflow
status, bit 61-63 need to be masked off.

- uncore_pmu_cpu_prepare()
I don't understand the x86_max_cores < 2 test. If you run your
NHM/WSM is single core mode, you still have uncore to deal with
thus, you need cpuc->intel_uncore initialized, don't you?

- I assume that the reason the uncore->refcnt management is not
using atomic ops because the whole CPU hotplug is serialized, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/