Re: [PATCH -v3] perf, x86: try to handle unknown nmis with runningperfctrs

From: Don Zickus
Date: Fri Aug 27 2010 - 14:58:12 EST


On Fri, Aug 27, 2010 at 10:10:38AM +0200, Robert Richter wrote:
> >
> > inc_irq_stat(apic_perf_irqs);
> > ack = status;
> > + intel_pmu_ack_status(ack);
>
> I would slightly change the patch:
>
> There is no need for the ack variable anymore, you could directly work
> with the status.
>
> I would call intel_pmu_ack_status() as close as possible after the
> intel_pmu_get_status(), which is after 'again:'.
>

Here is another version of the patch, which uses Robert's suggestion of
removing the ack variable

Cheers,
Don

--

From: Don Zickus <dzickus@xxxxxxxxxx>
Date: Fri, 27 Aug 2010 14:43:03 -0400
Subject: [PATCH] [x86] perf: fix accidentally ack'ing a second event on intel perf counter

During testing of a patch to stop having the perf subsytem swallow nmis,
it was uncovered that Nehalem boxes were randomly getting unknown nmis
when using the perf tool.

Moving the ack'ing of the PMI closer to when we get the status allows
the hardware to properly re-set the PMU bit signaling another PMI was
triggered during the processing of the first PMI. This allows the new
logic for dealing with the shortcomings of multiple PMIs to handle the
extra NMI by 'eat'ing it later.

Now one can wonder why are we getting a second PMI when we disable all
the PMUs in the begining of the NMI handler to prevent such a case, for
that I do not know. But I know the fix below helps deal with this quirk.

Tested on multiple Nehalems where the problem was occuring. With the
patch, the code now loops a second time to handle the second PMI (whereas
before it was not).

Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
---
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 4539b4b..ee05c90 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -712,7 +712,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
struct perf_sample_data data;
struct cpu_hw_events *cpuc;
int bit, loops;
- u64 ack, status;
+ u64 status;
int handled = 0;

perf_sample_data_init(&data, 0);
@@ -729,6 +729,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

loops = 0;
again:
+ intel_pmu_ack_status(status);
if (++loops > 100) {
WARN_ONCE(1, "perfevents: irq loop stuck!\n");
perf_event_print_debug();
@@ -737,7 +738,6 @@ again:
}

inc_irq_stat(apic_perf_irqs);
- ack = status;

intel_pmu_lbr_read();

@@ -766,8 +766,6 @@ again:
x86_pmu_stop(event);
}

- intel_pmu_ack_status(ack);
-
/*
* Repeat if there is more work to be done:
*/
--
1.7.2.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/