Re: Profiling of vdso_test_random

From: Adhemerval Zanella Netto
Date: Wed Sep 04 2024 - 09:29:19 EST




On 04/09/24 08:41, Christophe Leroy wrote:
> Hi,
>
> I'm done a 'perf record' on vdso_test_random reduced to vdso test only, and I get the following function usage profile.
>
> Do you see the same type of percentage on your platforms ?
>
> I would have expected most of the time to be spent in __arch_chacha20_blocks_nostack() but that's in fact not the case.
>
> # Samples: 61K of event 'task-clock:ppp'
> # Event count (approx.): 15463500000
> #
> # Overhead  Command          Shared Object        Symbol
> # ........  ...............  ................... ....................................
> #
>     57.74%  vdso_test_getra  [vdso]               [.] __c_kernel_getrandom
>     22.49%  vdso_test_getra  [vdso]               [.] __arch_chacha20_blocks_nostack
>     10.80%  vdso_test_getra  vdso_test_getrandom  [.] test_vdso_getrandom
>      8.89%  vdso_test_getra  [vdso]               [.] __kernel_getrandom
>      0.01%  vdso_test_getra  [kernel.kallsyms]    [k] finish_task_switch.isra.0
>

After tinkering with vDSO build parameters (I had to remove the '-Bsymbolic'
and 'objdump -S') to get perf show the symbols I see on aarch64 with a reduced
vdso_test_random:

$ perf record ./vdso_test_getrandom bench-single
$ perf report
[...]
# Samples: 305 of event 'cycles:P'
# Event count (approx.): 5583551
#
# Overhead Command Shared Object Symbol
# ........ ............... ................... .........................................
#
44.27% vdso_test_getra [vdso] [.] __arch_chacha20_blocks_nostack
21.16% vdso_test_getra [vdso] [.] __kernel_getrandom
6.19% vdso_test_getra [kernel.kallsyms] [k] task_mm_cid_work
3.14% vdso_test_getra [kernel.kallsyms] [k] perf_iterate_ctx
2.96% vdso_test_getra vdso_test_getrandom [.] test_vdso_getrandom
2.48% vdso_test_getra [kernel.kallsyms] [k] __memcg_slab_free_hook
2.28% vdso_test_getra [kernel.kallsyms] [k] next_uptodate_folio
2.05% vdso_test_getra [kernel.kallsyms] [k] _raw_spin_unlock_irq

It is what I would expect, so I am not sure why might be different on powerpc.