Re: [RFC 00/48] perf tools: Introduce data type profiling (v1)

From: Ingo Molnar
Date: Thu Oct 12 2023 - 02:03:19 EST



* Namhyung Kim <namhyung@xxxxxxxxxx> wrote:

> * How to use it
>
> To get precise memory access samples, users can use `perf mem record`
> command to utilize those events supported by their architecture. Intel
> machines would work best as they have dedicated memory access events but
> they would have a filter to ignore low latency loads like less than 30
> cycles (use --ldlat option to change the default value).
>
> # To get memory access samples in kernel for 1 second (on Intel)
> $ sudo perf mem record -a -K --ldlat=4 -- sleep 1
>
> # Similar for the AMD (but it requires 6.3+ kernel for BPF filters)
> $ sudo perf mem record -a --filter 'mem_op == load, ip > 0x8000000000000000' -- sleep 1

BTW., it would be nice for 'perf mem record' to just do the right thing on
whatever machine it is running on.

Also, why are BPF filters required - due to the IP filtering of mem-load
events?

Could we perhaps add an IP filter to perf events to get this built-in?
Perhaps attr->exclude_user would achieve something similar?

> In perf report, it's just a matter of selecting new sort keys: 'type'
> and 'typeoff'. The 'type' shows name of the data type as a whole while
> 'typeoff' shows name of the field in the data type. I found it useful
> to use it with --hierarchy option to group relevant entries in the same
> level.
>
> $ sudo perf report -s type,typeoff --hierarchy --stdio
> ...
> #
> # Overhead Data Type / Data Type Offset
> # ........... ............................
> #
> 23.95% (stack operation)
> 23.95% (stack operation) +0 (no field)
> 23.43% (unknown)
> 23.43% (unknown) +0 (no field)
> 10.30% struct pcpu_hot
> 4.80% struct pcpu_hot +0 (current_task)
> 3.53% struct pcpu_hot +8 (preempt_count)
> 1.88% struct pcpu_hot +12 (cpu_number)
> 0.07% struct pcpu_hot +24 (top_of_stack)
> 0.01% struct pcpu_hot +40 (softirq_pending)
> 4.25% struct task_struct
> 1.48% struct task_struct +2036 (rcu_read_lock_nesting)
> 0.53% struct task_struct +2040 (rcu_read_unlock_special.b.blocked)
> 0.49% struct task_struct +2936 (cred)
> 0.35% struct task_struct +3144 (audit_context)
> 0.19% struct task_struct +46 (flags)
> 0.17% struct task_struct +972 (policy)
> 0.15% struct task_struct +32 (stack)
> 0.15% struct task_struct +8 (thread_info.syscall_work)
> 0.10% struct task_struct +976 (nr_cpus_allowed)
> 0.09% struct task_struct +2272 (mm)
> ...

This looks really useful!

Thanks,

Ingo