Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection

From: Anshuman Khandual
Date: Tue Oct 11 2022 - 05:21:50 EST




On 10/10/22 19:47, James Clark wrote:
>
>
> On 06/10/2022 14:37, James Clark wrote:
>>
>>
>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>> driver exported callbacks later. The actual BRBE feature detection will be
>>> added by the driver itself.
>>>
>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>> later helps in operating BRBE during a perf event context.
>>>
>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>>> ---
>>> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>>> index 933b96e243b8..acdc445081aa 100644
>>> --- a/drivers/perf/arm_pmu_platform.c
>>> +++ b/drivers/perf/arm_pmu_platform.c
>>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>> return err;
>>> }
>>>
>>> +static void arm_brbe_probe_cpu(void *info)
>>> +{
>>> + struct pmu_hw_events *hw_events;
>>> + struct arm_pmu *armpmu = info;
>>> +
>>> + /*
>>> + * Return from here, if BRBE driver has not been
>>> + * implemented for this PMU. This helps prevent
>>> + * kernel crash later when brbe_probe() will be
>>> + * called on the PMU.
>>> + */
>>> + if (!armpmu->brbe_probe)
>>> + return;
>>> +
>>> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>> + armpmu->brbe_probe(hw_events);
>>> +}
>>> +
>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>> +{
>>> + int cpu, err = 0;
>>> +
>>> + for_each_cpu(cpu, &armpmu->supported_cpus) {
>>> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>
>> Hi Anshuman,
>>
>> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
>
> Can you confirm if this is currently the correct place to apply this to?

This series applied on v6.0-rc5 after the perf ABI changes, both in kernel
and in user space tools.

> I'm only getting 0 length branch stacks now. Seems like it could be
> something to do with the layout of perf samples because I know that was
> done in separate commits:

Right, might be.

>
> sudo ./perf record -j any_call -- ls
> ./perf report -D | grep "branch stack"
> ... branch stack: nr:0
> ... branch stack: nr:0
> ... branch stack: nr:0
> ... branch stack: nr:0

I am planning to respin the series on 6.1-rc1 next week which should solve
these multiple moving parts problem.

> ...
>
>> this:
>>
>> armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
>> brbe: implementation found on cpu 0
>>
>> =============================
>> [ BUG: Invalid wait context ]
>> 6.0.0-rc7 #38 Not tainted
>> -----------------------------
>> kworker/u8:0/9 is trying to lock:
>> ffff000800855898 (&port_lock_key){....}-{3:3}, at:
>> pl011_console_write+0x148/0x240
>> other info that might help us debug this:
>> context-{2:2}
>> 5 locks held by kworker/u8:0/9:
>> #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>> #1: ffff80000807bde0
>> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>> #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
>> trace_event_eval_update+0x28/0x420
>> #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
>> vprintk_emit+0x130/0x380
>> #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
>> console_emit_next_record.constprop.0+0x128/0x338
>> stack backtrace:
>> CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
>> Hardware name: Foundation-v8A (DT)
>> Workqueue: eval_map_wq eval_map_work_func
>> Call trace:
>> dump_backtrace+0x114/0x120
>> show_stack+0x20/0x58
>> dump_stack_lvl+0x9c/0xd8
>> dump_stack+0x18/0x34
>> __lock_acquire+0x17cc/0x1920
>> lock_acquire+0x138/0x3b8
>> _raw_spin_lock+0x58/0x70
>> pl011_console_write+0x148/0x240
>> console_emit_next_record.constprop.0+0x194/0x338
>> console_unlock+0x18c/0x208
>> vprintk_emit+0x24c/0x380
>> vprintk_default+0x40/0x50
>> vprintk+0xd4/0xf0
>> _printk+0x68/0x90
>> arm64_pmu_brbe_probe+0x10c/0x128
>> armv8pmu_brbe_probe+0x18/0x28
>> arm_brbe_probe_cpu+0x44/0x58
>> __flush_smp_call_function_queue+0x1d0/0x440
>> generic_smp_call_function_single_interrupt+0x20/0x78
>> ipi_handler+0x98/0x368
>> handle_percpu_devid_irq+0xc0/0x3a8
>> generic_handle_domain_irq+0x34/0x50
>> gic_handle_irq+0x58/0x138
>> call_on_irq_stack+0x2c/0x58
>> do_interrupt_handler+0x88/0x90
>> el1_interrupt+0x40/0x78
>> el1h_64_irq_handler+0x18/0x28
>> el1h_64_irq+0x64/0x68
>> trace_event_eval_update+0x114/0x420
>> eval_map_work_func+0x30/0x40
>> process_one_work+0x298/0x6b0
>> worker_thread+0x54/0x408
>> kthread+0x118/0x128
>> ret_from_fork+0x10/0x20
>> brbe: implementation found on cpu 1
>> brbe: implementation found on cpu 2
>> brbe: implementation found on cpu 3
>>
>>> + if (err)
>>> + return err;
>>> + }
>>> + return err;
>>> +}
>>> +
>>> static void armpmu_free_irqs(struct arm_pmu *armpmu)
>>> {
>>> int cpu;
>>> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>>> if (ret)
>>> goto out_free_irqs;
>>>
>>> + ret = armpmu_request_brbe(pmu);
>>> + if (ret)
>>> + goto out_free_irqs;
>>> +
>>> ret = armpmu_register(pmu);
>>> if (ret) {
>>> dev_err(dev, "failed to register PMU devices!\n");