[RFC] perf/x86: PMU IRQ handler issues

From: Stephane Eranian
Date: Wed May 28 2014 - 15:48:55 EST


Hi,

Some days ago, I was alerted that under important network load, something
is going wrong with perf_event sampling in frequency mode (such as perf top).
The number of samples was way too low given the cycle count (via perf stat).
Looking at the syslog, I noticed that the perf irq latency throttler
had kicked in
several times. There may have been several reasons for this.

Maybe the workload had changing phases and the frequency adjustments
was not working properly and dropping to very small period and then generated
flood of interrupts.

Another explanation is that because we ACK the NMI early, we leave the
door open to other interrupts, incl. NIC, and we are interrupting the execution
of the PMU IRQ handler, yet that detour is measured in the PMU handler
latency, causing more throttling than needed. Is that a plausible scenario too?
And if so, I think we need to narrow the window for timing errors, by
acking late
on all processors and not just HSW.

I still suspect there is something wrong with the frequency mode.

Any better explanation for the problem?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/