Re: [PATCH] perf: Allow suppressing AUX records

From: Peter Zijlstra
Date: Thu Mar 29 2018 - 07:54:48 EST


On Mon, Jan 15, 2018 at 05:00:20PM +0200, Alexander Shishkin wrote:
> It has been pointed out to me many times that it is useful to be able
> to switch off AUX records to save the bandwidth for records that actually
> matter, for example, in AUX overwrite mode.
>
> The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> The OVERWRITE flag, on the other hand will be set on every single record
> in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> generated on every target task's sched_out, which over time adds up to
> a lot of useless information.
>
> In case the existing userspace depends on AUX records in the overwrite
> mode, we preserve the original behavior and add an opt-in for the new
> behavior, wherein the 'useless' records get suppressed.
>
> This patch adds an attribute bit to the described effect.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Markus Metzger <markus.t.metzger@xxxxxxxxx>
> Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> ---
> include/uapi/linux/perf_event.h | 3 ++-
> kernel/events/core.c | 5 +++++
> kernel/events/ring_buffer.c | 13 +++++++++++--
> 3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index c77c9a2ebbbb..d7a981130561 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -370,7 +370,8 @@ struct perf_event_attr {
> context_switch : 1, /* context switch data */
> write_backward : 1, /* Write ring buffer from end to beginning */
> namespaces : 1, /* include namespaces data */
> - __reserved_1 : 35;
> + suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
> + __reserved_1 : 34;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */

So I'm basically fine with this patch, however I wonder if we really
need this suppress flag and can't just unconditionally drop these
events.

Ash said that as far as he knows no Intel-PT user actually relies on it;
Will is there anything ARM that is known to rely on them?

In anycase, tentative ACK on this, unless we wants to be brave and forgo
this flag.

Ingo, any opinions?


> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4e1a1bf8d867..6245a88c2bda 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10012,6 +10012,11 @@ SYSCALL_DEFINE5(perf_event_open,
> goto err_context;
> }
>
> + if (attr.suppress_aux && !pmu->setup_aux) {
> + err = -EINVAL;
> + goto err_context;
> + }
> +
> /*
> * Look up the group leader (we will attach this event to it):
> */
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 141aa2ca8728..381f080e6409 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -426,6 +426,12 @@ static bool __always_inline rb_need_aux_wakeup(struct ring_buffer *rb)
> return false;
> }
>
> +/*
> + * These flags won't generate a PERF_RECORD_AUX on their own if
> + * attr::suppress_aux is set.
> + */
> +#define SUPPRESSABLE_FLAGS PERF_AUX_FLAG_OVERWRITE
> +
> /*
> * Commit the data written by hardware into the ring buffer by adjusting
> * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
> @@ -460,8 +466,11 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
> * Only send RECORD_AUX if we have something useful to communicate
> */
>
> - perf_event_aux_event(handle->event, aux_head, size,
> - handle->aux_flags);
> + if (!handle->event->attr.suppress_aux ||
> + (handle->aux_flags & ~(u64)SUPPRESSABLE_FLAGS)) {
> + perf_event_aux_event(handle->event, aux_head, size,
> + handle->aux_flags);
> + }
> }
>
> rb->user_page->aux_head = rb->aux_head;
> --
> 2.15.1
>