[PATCH v0 0/2] perf: Allow forcing high-order allocation for AUX buffers

From: Alexander Shishkin
Date: Wed Feb 13 2019 - 06:47:56 EST

Hi Peter and Arnaldo,

It turns out that using high-order allocations for AUX buffers reduces the
run-time performance penalty, for example, with Intel PT. The assumption is
that this comes from not having to fetch the next page's address at every
page boundary. Given a workload that does a lot of indirect branches (thus
generating more PT data with addresses of branch targets), it takes around
6% longer to complete under PT in snapshot mode than without PT, but only
around 4% if we use high-order output regions instead of single page output
regions. This is measured on an Atom CPU.

We already use high-order allocations for PMUs that don't do HW SG (like
Intel PT on BDW). This patchset adds an attribute bit that enables the
same for the PMUs that do have HW SG, and a command line option for perf
record to set this bit.

Alexander Shishkin (2):
perf: Add an option to ask for high order allocations for AUX buffers
perf record: Add --aux-highorder

include/uapi/linux/perf_event.h | 3 ++-
kernel/events/core.c | 3 +++
kernel/events/ring_buffer.c | 3 ++-
tools/include/uapi/linux/perf_event.h | 3 ++-
tools/perf/builtin-record.c | 2 ++
tools/perf/perf.h | 1 +
tools/perf/util/evsel.c | 9 +++++++++
7 files changed, 21 insertions(+), 3 deletions(-)