Re: [PATCH V2 3/7] arm64/perf: Update struct pmu_hw_events for BRBE

From: Anshuman Khandual
Date: Tue Sep 13 2022 - 23:39:27 EST




On 9/13/22 17:13, Mark Brown wrote:
> On Tue, Sep 13, 2022 at 11:03:45AM +0530, Anshuman Khandual wrote:
>> On 9/12/22 15:42, Mark Brown wrote:
>
>>> like it would be clearer and safer to allocate these dynamically
>>> when BRBE is used if that's possible, I'd expect that should also
>>> deal with the stack frame size issues as well.
>
>> That might not be possible because the generic 'struct perf_branch_stack'
>> expects 'perf_branch_stack.entries' to be a variable array which is also
>> contiguous in memory, with other elements in 'perf_branch_stack'. Besides
>> that will be a deviation from similar implementations on x86 and powerpc
>> platforms.
>
>> The stack frame size came up because BRBE_MAX_ENTRIES is 64 compared to
>> just 32 on other platforms, which follow the exact same method.
>
> If this is a pattern used by other architectures and relied on by
> the core that doesn't mean it's impossible to do anything, it
> means that the existing code needs to be updated to allow the
> larger number of entries for BRBE if we want to change things.
> That is a lot of effort of course so something that moves the
> allocation off the stack would be more expedient in the short
> term.

Something like the following change moves the buffer allocation off the stack,
although it requires updating the driver, and buffer assignment during a PMU
interrupt. But it does seem to work (will require some more testing).

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 3e7757d05146..a3401122d855 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -52,6 +52,12 @@ static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_PRIV) == ARMPMU_EVT_PRIV);
*/
#define BRBE_MAX_ENTRIES 64

+/* Captured BRBE buffer - copied as is into perf_sample_data */
+struct brbe_records {
+ struct perf_branch_stack brbe_stack;
+ struct perf_branch_entry brbe_entries[BRBE_MAX_ENTRIES];
+};
+
/* The events for a given PMU register set. */
struct pmu_hw_events {
/*
@@ -92,9 +98,7 @@ struct pmu_hw_events {
unsigned int brbe_users;
void *brbe_context;

- /* Captured BRBE buffer - copied as is into perf_sample_data */
- struct perf_branch_stack brbe_stack;
- struct perf_branch_entry brbe_entries[BRBE_MAX_ENTRIES];
+ struct brbe_records *branch_records;
};
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 05848c6d955c..2f0957519307 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -951,6 +951,13 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
goto out_free_pmu;
}

+ for_each_possible_cpu(cpu) {
+ struct pmu_hw_events *events = per_cpu_ptr(pmu->hw_events, cpu);
+
+ events->branch_records = kmalloc(sizeof(struct brbe_records), flags);
+ WARN_ON(!events->branch_records);
+ }