[patch] perf_event_open.2: PERF_RECORD_SWITCH support
From: Vince Weaver
Date: Tue Oct 18 2016 - 13:22:37 EST
Linux 4.3 introduced two new record types for recording context
switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.
The advantage over the existing tracepoint and software context
switch events is primarily that full switch in/out data can be
gathered even in the face of restrictive perf_event_paranoid
settings.
Signed-off-by: Vince Weaver <vincent.weaver@xxxxxxxxx>
diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
index 68b99bb..04a0cf5 100644
--- a/man2/perf_event_open.2
+++ b/man2/perf_event_open.2
@@ -243,8 +243,9 @@ struct perf_event_attr {
comm_exec : 1, /* flag comm events that are
due to exec */
use_clockid : 1, /* use clockid for time fields */
+ context_switch : 1, /* context switch data */
- __reserved_1 : 38;
+ __reserved_1 : 37;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1112,6 +1113,21 @@ field.
This can make it easier to correlate perf sample times with
timestamps generated by other tools.
.TP
+.IR "context_switch" " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This enables the generation of
+.B PERF_RECORD_SWITCH
+records when a context switch occurs.
+It also enables the generation of
+.B PERF_RECORD_SWITCH_CPU_WIDE
+records when sampling in cpu-wide mode.
+This functionality is in addition to existing tracepoint and
+software events for measuring context switches.
+The advantage of this method is that it will give full
+information event with strict
+.I perf_event_paranoid
+settings.
+.TP
.IR "wakeup_events" ", " "wakeup_watermark"
This union sets how many samples
.RI ( wakeup_events )
@@ -1792,7 +1808,8 @@ Sample happened in guest user code.
.RE
.RS
-In addition, one of the following bits can be set:
+The following three statuses are generated by
+different record types so they alias to the same bit:
.TP
.BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
.\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
@@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
if a process name change was caused by an
.BR exec (2)
system call.
-It is an alias for
-.B PERF_RECORD_MISC_MMAP_DATA
-since the two values would not be set in the same record.
+.TP
+.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
+.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+When a
+.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
+record is generated this bit indicates that the
+context switch is away from the current process
+(instead of in to the current process).
+.RE
+
+.RS
+In addition, the following bits can be set:
.TP
.B PERF_RECORD_MISC_EXACT_IP
This indicates that the content of
@@ -2583,6 +2609,59 @@ struct {
.I lost
the number of potentially lost samples.
.RE
+.TP
+.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+This record indicates a context switch has happened.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+ struct perf_event_header header;
+ struct sample_id sample_id;
+};
+.fi
+.TP
+.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
+\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
+As with
+.B PERF_RECORD_SWITCH
+this record indicates a context switch has happened,
+but it only occurs when sampling in cpu-wide mode
+and provides additional information on the process
+being switched to/from.
+The
+.B PERF_RECORD_MISC_SWITCH_OUT
+bit in the
+.I misc
+field indicates whether it was a context switch into
+or away from the current process.
+
+.in +4n
+.nf
+struct {
+ struct perf_event_header header;
+ u32 next_prev_pid;
+ u32 next_prev_tid;
+ struct sample_id sample_id;
+};
+.fi
+.RS
+.TP
+.I next_prev_pid
+The process id of the previous (if switching in)
+or next (if switching out) process on the CPU.
+.TP
+.I next_prev_tid
+The thread id of the previous (if switching in)
+or next (if switching out) thread on the CPU.
+.RE
.RE
.SS Overflow handling
Events can be set to notify when a threshold is crossed,