Re: [patch] perf_event_open.2: PERF_RECORD_SWITCH support

From: Michael Kerrisk (man-pages)
Date: Wed Oct 19 2016 - 03:00:35 EST


Hi Vince,

On 10/18/2016 07:22 PM, Vince Weaver wrote:
>
> Linux 4.3 introduced two new record types for recording context
> switches: PERF_RECORD_SWITCH and PERF_RECORD_SWITCH_CPU_WIDE.
>
> The advantage over the existing tracepoint and software context
> switch events is primarily that full switch in/out data can be
> gathered even in the face of restrictive perf_event_paranoid
> settings.
>
> Signed-off-by: Vince Weaver <vincent.weaver@xxxxxxxxx>

Thanks! Applied. One query below.

> diff --git a/man2/perf_event_open.2 b/man2/perf_event_open.2
> index 68b99bb..04a0cf5 100644
> --- a/man2/perf_event_open.2
> +++ b/man2/perf_event_open.2
> @@ -243,8 +243,9 @@ struct perf_event_attr {
> comm_exec : 1, /* flag comm events that are
> due to exec */
> use_clockid : 1, /* use clockid for time fields */
> + context_switch : 1, /* context switch data */
>
> - __reserved_1 : 38;
> + __reserved_1 : 37;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1112,6 +1113,21 @@ field.
> This can make it easier to correlate perf sample times with
> timestamps generated by other tools.
> .TP
> +.IR "context_switch" " (since Linux 4.3)"
> +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +This enables the generation of
> +.B PERF_RECORD_SWITCH
> +records when a context switch occurs.
> +It also enables the generation of
> +.B PERF_RECORD_SWITCH_CPU_WIDE
> +records when sampling in cpu-wide mode.
> +This functionality is in addition to existing tracepoint and
> +software events for measuring context switches.
> +The advantage of this method is that it will give full

s/give full/give a full/

ok?

> +information event with strict
> +.I perf_event_paranoid
> +settings.
> +.TP
> .IR "wakeup_events" ", " "wakeup_watermark"
> This union sets how many samples
> .RI ( wakeup_events )
> @@ -1792,7 +1808,8 @@ Sample happened in guest user code.
> .RE
>
> .RS
> -In addition, one of the following bits can be set:
> +The following three statuses are generated by
> +different record types so they alias to the same bit:
> .TP
> .BR PERF_RECORD_MISC_MMAP_DATA " (since Linux 3.10)"
> .\" commit 2fe85427e3bf65d791700d065132772fc26e4d75
> @@ -1807,9 +1824,18 @@ record on kernels more recent than Linux 3.16
> if a process name change was caused by an
> .BR exec (2)
> system call.
> -It is an alias for
> -.B PERF_RECORD_MISC_MMAP_DATA
> -since the two values would not be set in the same record.
> +.TP
> +.BR PERF_RECORD_MISC_SWITCH_OUT " (since Linux 4.3)"
> +.\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +When a
> +.BR PERF_RECORD_SWITCH " or " PERF_RECORD_SWITCH_CPU_WIDE
> +record is generated this bit indicates that the
> +context switch is away from the current process
> +(instead of in to the current process).
> +.RE
> +
> +.RS
> +In addition, the following bits can be set:
> .TP
> .B PERF_RECORD_MISC_EXACT_IP
> This indicates that the content of
> @@ -2583,6 +2609,59 @@ struct {
> .I lost
> the number of potentially lost samples.
> .RE
> +.TP
> +.BR PERF_RECORD_SWITCH " (since Linux 4.3)"
> +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +This record indicates a context switch has happened.
> +The
> +.B PERF_RECORD_MISC_SWITCH_OUT
> +bit in the
> +.I misc
> +field indicates whether it was a context switch into
> +or away from the current process.
> +
> +.in +4n
> +.nf
> +struct {
> + struct perf_event_header header;
> + struct sample_id sample_id;
> +};
> +.fi
> +.TP
> +.BR PERF_RECORD_SWITCH_CPU_WIDE " (since Linux 4.3)"
> +\" commit 45ac1403f564f411c6a383a2448688ba8dd705a4
> +As with
> +.B PERF_RECORD_SWITCH
> +this record indicates a context switch has happened,
> +but it only occurs when sampling in cpu-wide mode
> +and provides additional information on the process
> +being switched to/from.
> +The
> +.B PERF_RECORD_MISC_SWITCH_OUT
> +bit in the
> +.I misc
> +field indicates whether it was a context switch into
> +or away from the current process.
> +
> +.in +4n
> +.nf
> +struct {
> + struct perf_event_header header;
> + u32 next_prev_pid;
> + u32 next_prev_tid;
> + struct sample_id sample_id;
> +};
> +.fi
> +.RS
> +.TP
> +.I next_prev_pid
> +The process id of the previous (if switching in)
> +or next (if switching out) process on the CPU.
> +.TP
> +.I next_prev_tid
> +The thread id of the previous (if switching in)
> +or next (if switching out) thread on the CPU.
> +.RE
> .RE
> .SS Overflow handling
> Events can be set to notify when a threshold is crossed,
>

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/