Re: [PATCH V7 5/6] arm64/perf: Add branch stack support in ARMV8 PMU

From: Mark Rutland
Date: Wed Feb 08 2023 - 14:36:09 EST


On Fri, Jan 13, 2023 at 10:41:51AM +0530, Anshuman Khandual wrote:
>
>
> On 1/12/23 19:59, Mark Rutland wrote:
> > On Thu, Jan 05, 2023 at 08:40:38AM +0530, Anshuman Khandual wrote:
> >> @@ -878,6 +890,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
> >> if (!armpmu_event_set_period(event))
> >> continue;
> >>
> >> + if (has_branch_stack(event)) {
> >> + WARN_ON(!cpuc->branches);
> >> + armv8pmu_branch_read(cpuc, event);
> >> + data.br_stack = &cpuc->branches->branch_stack;
> >> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> >> + }
> >
> > How do we ensure the data we're getting isn't changed under our feet? Is BRBE
> > disabled at this point?
>
> Right, BRBE is paused after a PMU IRQ. We also ensure the buffer is disabled for
> all exception levels, i.e removing BRBCR_EL1_E0BRE/E1BRE from the configuration,
> before initiating the actual read, which eventually populates the data.br_stack.

Ok; just to confirm, what exactly is the condition that enforces that BRBE is
disabled? Is that *while* there's an overflow asserted, or does something else
get set at the instant the overflow occurs?

What exactly is necessary for it to start again?

> > Is this going to have branches after taking the exception, or does BRBE stop
> > automatically at that point? If so we presumably need to take special care as
> > to when we read this relative to enabling/disabling and/or manipulating the
> > overflow bits.
>
> The default BRBE configuration includes setting BRBCR_EL1.FZP, enabling BRBE to
> be paused automatically, right after a PMU IRQ. Regardless, before reading the
> buffer, BRBE is paused (BRBFCR_EL1.PAUSED) and disabled for all privilege levels
> ~(BRBCR_EL1.E0BRE/E1BRE) which ensures that no new branch record is getting into
> the buffer, while it is being read for perf right buffer.

Ok; I think we could do with some comments as to this.

>
> >
> >> +
> >> /*
> >> * Perf event overflow will queue the processing of the event as
> >> * an irq_work which will be taken care of in the handling of
> >> @@ -976,6 +995,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
> >> return event->hw.idx;
> >> }
> >>
> >> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> >> +{
> >> + struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
> >> +
> >> + if (sched_in && arm_pmu_branch_stack_supported(armpmu))
> >> + armv8pmu_branch_reset();
> >> +}
> >
> > When scheduling out, shouldn't we save what we have so far?
> >
> > It seems odd that we just throw that away rather than placing it into a FIFO.
>
> IIRC we had discussed this earlier, save and restore mechanism will be added
> later, not during this enablement patch series.

Sorry, but why?

I don't understand why it's acceptable to non-deterministically throw away data
for now. At the least that's going to confuse users, especially as the
observable behaviour may change if and when that's added later.

I assume that there's some reason that it's painful to do that? Could you
please elaborate on that?

> For now resetting the buffer ensures that branch records from one session
> does not get into another.

I agree that it's necessary to do that, but as above I don't believe it's
sufficient.

> Note that these branches cannot be pushed into perf ring buffer either, as
> there was no corresponding PMU interrupt to be associated with.

I'm not suggesting we put it in the perf ring buffer; I'm suggesting that we
snapshot it into *some* kernel-internal storage, then later reconcile that.

Maybe that's far more painful than I expect?

Thanks,
Mark.