On Tue, Nov 10, 2020 at 7:37 AM Peter Zijlstra<peterz@xxxxxxxxxxxxx> wrote:Thanks Stephane for clarifying the use cases for Freeze-on-[PMI|Overflow].
On Tue, Nov 10, 2020 at 04:12:57PM +0100, Peter Zijlstra wrote:
On Mon, Nov 09, 2020 at 10:12:37AM +0800, Like Xu wrote:
The Precise Event Based Sampling(PEBS) supported on Intel Ice Lake server
platforms can provide an architectural state of the instruction executed
after the instruction that caused the event. This patch set enables the
the PEBS via DS feature for KVM (also non) Linux guest on the Ice Lake.
The Linux guest can use PEBS feature like native:
# perf record -e instructions:ppp ./br_instr a
# perf record -c 100000 -e instructions:pp ./br_instr a
If the counter_freezing is not enabled on the host, the guest PEBS will
be disabled on purpose when host is using PEBS facility. By default,
KVM disables the co-existence of guest PEBS and host PEBS.
What is implemented is Freeze-on-Overflow, yet it is described as Freeze-on-PMI.Uuhh, what?!? counter_freezing should never be enabled, its broken. Let---
me go delete all that code.
Subject: perf/intel: Remove Perfmon-v4 counter_freezing support
Perfmon-v4 counter freezing is fundamentally broken; remove this default
disabled code to make sure nobody uses it.
The feature is called Freeze-on-PMI in the SDM, and if it would do that,
there wouldn't actually be a problem,*however* it does something subtly
different. It globally disables the whole PMU when it raises the PMI,
not when the PMI hits.
This means there's a window between the PMI getting raised and the PMI
actually getting served where we loose events and this violates the
perf counter independence. That is, a counting event should not result
in a different event count when there is a sampling event co-scheduled.
That, in itself, is a problem. I agree with you on that point.
However, there are use cases for both modes.
I can sample on event A and count on B, C and when A overflows, I want
to snapshot B, C.
For that I want B, C at the moment of the overflow, not at the moment
the PMI is delivered. Thus, youd
would want the Freeze-on-overflow behavior. You can collect in this
mode with the perf tool,
IIRC: perf record -e '{cycles,instructions,branches:S}' ....
The other usage model is that of the replay-debugger (rr) which you are alluding
to, which needs precise count of an event including during the skid
window. For that, you need
Freeze-on-PMI (delivered). Note that this tool likely only cares about
user level occurrences of events.
As for counter independence, I am not sure it holds in all cases. If
the events are setup for user+kernel
then, as soon as you co-schedule a sampling event, you will likely get
more counts on the counting
event due to the additional kernel entries/exits caused by
interrupt-based profiling. Even if you were to
restrict to user level only, I would expect to see a few more counts.
This is known to break existing software.
Signed-off-by: Peter Zijlstra (Intel)<peterz@xxxxxxxxxxxxx>