[PATCH] perf: Allow suppressing AUX records

From: Alexander Shishkin
Date: Mon Jan 15 2018 - 10:00:42 EST


It has been pointed out to me many times that it is useful to be able
to switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.

The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
The OVERWRITE flag, on the other hand will be set on every single record
in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to
a lot of useless information.

In case the existing userspace depends on AUX records in the overwrite
mode, we preserve the original behavior and add an opt-in for the new
behavior, wherein the 'useless' records get suppressed.

This patch adds an attribute bit to the described effect.

Signed-off-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Markus Metzger <markus.t.metzger@xxxxxxxxx>
Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
---
include/uapi/linux/perf_event.h | 3 ++-
kernel/events/core.c | 5 +++++
kernel/events/ring_buffer.c | 13 +++++++++++--
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c77c9a2ebbbb..d7a981130561 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -370,7 +370,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
+ __reserved_1 : 34;

union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e1a1bf8d867..6245a88c2bda 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10012,6 +10012,11 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;
}

+ if (attr.suppress_aux && !pmu->setup_aux) {
+ err = -EINVAL;
+ goto err_context;
+ }
+
/*
* Look up the group leader (we will attach this event to it):
*/
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 141aa2ca8728..381f080e6409 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -426,6 +426,12 @@ static bool __always_inline rb_need_aux_wakeup(struct ring_buffer *rb)
return false;
}

+/*
+ * These flags won't generate a PERF_RECORD_AUX on their own if
+ * attr::suppress_aux is set.
+ */
+#define SUPPRESSABLE_FLAGS PERF_AUX_FLAG_OVERWRITE
+
/*
* Commit the data written by hardware into the ring buffer by adjusting
* aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
@@ -460,8 +466,11 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
* Only send RECORD_AUX if we have something useful to communicate
*/

- perf_event_aux_event(handle->event, aux_head, size,
- handle->aux_flags);
+ if (!handle->event->attr.suppress_aux ||
+ (handle->aux_flags & ~(u64)SUPPRESSABLE_FLAGS)) {
+ perf_event_aux_event(handle->event, aux_head, size,
+ handle->aux_flags);
+ }
}

rb->user_page->aux_head = rb->aux_head;
--
2.15.1