Re: perf: Question about machine__create_extra_kernel_maps and trampoline symbols
From: Ian Rogers
Date: Thu Feb 13 2025 - 13:22:35 EST
On Thu, Feb 13, 2025 at 10:17 AM Krzysztof Łopatowski
<krzysztof.m.lopatowski@xxxxxxxxx> wrote:
>
> Hi Ian,
>
> > We do have a kallsyms parsing benchmark:
>
> Yes, I've looked at `perf bench internals kallsyms-parse`. It returns for me
> Average kallsyms__parse took: 99,994 ms (+- 0,199 ms)
> However, this benchmark only measures the raw parsing speed of the kallsyms
> file, without any of the symbol processing that happens in real usage.
>
> > I was curious to know if the regression is also visible there?
>
> You can call it a regression if you mean from 2018 ;-)
> I gave measurements at the top to give a sense of scale and show it's not
> an already solved problem.
>
> The core issue is that we're calling 'kallsyms__parse' multiple times, when
> we could likely consolidate these calls since most of the overhead comes
> from reading and parsing, not from processing the symbols.
>
> Notably, the third call I mentioned (in machine__create_extra_kernel_maps)
> accounts for about half of the total kallsyms parsing time, yet appears to
> have no effect on my test system. This is why I'm questioning whether we
> need to keep this functionality.
>
> Ultimately, I believe we should explore ways to avoid reading /proc/kallsyms
> altogether, given how expensive this operation is.
Agreed. We had similar expensive operations in event parsing and that
has now largely been made lazy - so you can craft your command line to
not require all the costs. I can't answer your question but it seems
adding the symbol processing to the benchmark would have value.
Thanks,
Ian