* Jiri Kosina <jkosina@xxxxxxx> wrote:
On Thu, 13 Nov 2008, Ingo Molnar wrote:
Yup, I first wanted to make this known to the public in hope that it will ring a bell somewhere.I haven't yet found a time to start bisecting this.Would be nice to identify a commit to revert - in case we run out of time fixing it.
If noone sees an obvous reason for this, I will do my best to bisect this tomorrow.
We've got the one patch below pending, but that's not for AMD cpus so it shouldnt impact your case.
But ... some change made it all much more fragile. I'm curious why things became more fragile.
Ingo
--------------->
Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
From: Eric Dumazet <dada1@xxxxxxxxxxxxx>
Date: Tue, 11 Nov 2008 09:32:12 +0100
While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
I noticed that one CPU after the other could not get anymore NMI.
After a while, all cores where blocked (ie not generating events for oprofile)
I tried all major linux versions and all where affected by this freeze.
I found that we have to un-mask APIC *before* writing to MSR counter
when we get event notification, because we use APIC_LVTPC in edge triggered mode.
Signed-off-by: Eric Dumazet <dada1@xxxxxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
arch/x86/oprofile/op_model_ppro.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: tip/arch/x86/oprofile/op_model_ppro.c
===================================================================
--- tip.orig/arch/x86/oprofile/op_model_ppro.c
+++ tip/arch/x86/oprofile/op_model_ppro.c
@@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg
u64 val;
int i;
+ /*
+ * We need to unmask the apic vector *before* writing reset_value
+ * to msr counter, because we use edge trigger
+ */
+ apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
+
for (i = 0 ; i < num_counters; ++i) {
if (!reset_value[i])
continue;
@@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg
}
}
- /* Only P6 based Pentium M need to re-unmask the apic vector but it
- * doesn't hurt other P6 variant */
- apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
-
/* We can't work out if we really handled an interrupt. We
* might have caught a *second* counter just after overflowing
* the interrupt for this counter then arrives