Re: [PATCH V4 03/10] perf: Extend branch type classification

From: Robin Murphy
Date: Tue Mar 15 2022 - 09:06:56 EST


On 2022-03-15 11:22, Peter Zijlstra wrote:
On Tue, Mar 15, 2022 at 11:05:09AM +0530, Anshuman Khandual wrote:
branch_entry.type now has ran out of space to accommodate more branch types
classification. This will prevent perf branch stack implementation on arm64
(via BRBE) to capture all available branch types. Extending this bit field
i.e branch_entry.type [4 bits] is not an option as it will break user space
ABI both for little and big endian perf tools.

Extend branch classification with a new field branch_entry.new_type via a
new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which
could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as
well.

branch_entry.new_type is a 4 bit field which can hold upto 16 branch types.
The first three branch types will hold various generic page faults followed
by five architecture specific branch types, which can be overridden by the
platform for specific use cases. These architecture specific branch types
gets overridden on arm64 platform for BRBE implementation.

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 26d8f0b5ac0d..d29280adc3c4 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -255,9 +255,22 @@ enum {
PERF_BR_IRQ = 12, /* irq */
PERF_BR_SERROR = 13, /* system error */
PERF_BR_NO_TX = 14, /* not in transaction */
+ PERF_BR_EXTEND_ABI = 15, /* extend ABI */
PERF_BR_MAX,
};


#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1372,7 +1385,8 @@ struct perf_branch_entry {
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
type:4, /* branch type */
- reserved:40;
+ new_type:4, /* additional branch type */
+ reserved:36;
};

Hurmpf... this will effectively give us 5 bits of space for the cost of
8, that seems... unfortunate.

Would something like:

type:4,
ext_type:4,
reserved:36;

and have all software do:

type = pbe->type | (pbe->ext_type << 4);

Then old software will only know about the old types. New software on
old kernels will add 4 0's, which is harmless, while new software on new
kernels will get 8 bytes of type.

Would that work?

Depends how bad the effects of aliasing in existing software would be, I guess - e.g. new kernel outputs type 0x23 which software then interprets as 0x3 since it doesn't know about the extended bits. I'm guessing that's more likely "confusing to the user" than "catastrophically fatal", but it might still matter.

If software had an explicit opt-in to receiving extended types when requesting branch sampling in the first place we could avoid that worry, but then we'd need some additional complexity to sanitise records depending on that option :/

Robin.