RE: [Ptools-perfapi] [perfmon2] [PATCH] perf_events: AMD eventscheduling (v1)

From: John McCalpin
Date: Fri Jan 22 2010 - 13:52:58 EST

In the comments for perfctr's linux/drivers/perfctr/x86.c driver file, there is a note on this.
From perfctr version 2.6.31, item (2) refers to this issue:
* Multicore K8s have issues with northbridge events:
* 1. The NB is shared between the cores, so two different cores
* in the same node cannot count NB events simultaneously.
* This can be handled by using perfctr_cpus_forbidden_mask to
* restrict NB-using threads to core0 of all nodes.
* 2. The initial multicore chips (Revision E) have an erratum
* which causes the NB counters to be reset when either core
* reprograms its evntsels (even for non-NB events).
* This is only an issue because of scheduling of threads, so
* we restrict NB events to the non thread-centric API.
* For now we only implement the workaround for issue 2, as this
* also handles issue 1.
* TODO: Detect post Revision E chips and implement a weaker
* workaround for them.

I have gone back through the AMD Opteron Revision Guide for these processors
but I don't see any publicly disclosed errata that appear to be related to this issue.

Perhaps I will check it on my Athlon64FX system at home this weekend....


-----Original Message-----
From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]
Sent: Friday, January 22, 2010 11:42 AM
To: John McCalpin
Cc: 'Dan Terpstra'; eranian@xxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; ptools-perfapi@xxxxxxxxxxxx; perfmon2-devel@xxxxxxxxxxxx; fweisbec@xxxxxxxxx; paulus@xxxxxxxxx; mingo@xxxxxxx; davem@xxxxxxxxxxxxx
Subject: RE: [Ptools-perfapi] [perfmon2] [PATCH] perf_events: AMD event scheduling (v1)

On Fri, 2010-01-22 at 11:33 -0600, John McCalpin wrote:

> * Think of the system as having four performance monitors per core
> *plus* four performance monitors for the "shared" structures on the
> chip (L3, crossbar, HyperTransport links, memory controllers).

Would have been nice to have them as a separately addressable pmu
instead of shadowing the logical cpu's pmu.

But that's all ancient history of course..

> There is an additional hazard when working with early K8 processors --
> a hardware bug causes the counts of all shared counters to be reset to
> zero any time any shared register is programmed. This makes
> "protecting" users somewhat more difficult....

Could you qualify early k8 a bit more, it shouldn't be hard to add a
quirk for a specific set of cpus to read/reset all counters before
writing to the shared pmu.

¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_