Re: perf: Question about machine__create_extra_kernel_maps and trampoline symbols
From: Krzysztof Łopatowski
Date: Thu Feb 13 2025 - 13:18:56 EST
Hi Ian,
> We do have a kallsyms parsing benchmark:
Yes, I've looked at `perf bench internals kallsyms-parse`. It returns for me
Average kallsyms__parse took: 99,994 ms (+- 0,199 ms)
However, this benchmark only measures the raw parsing speed of the kallsyms
file, without any of the symbol processing that happens in real usage.
> I was curious to know if the regression is also visible there?
You can call it a regression if you mean from 2018 ;-)
I gave measurements at the top to give a sense of scale and show it's not
an already solved problem.
The core issue is that we're calling 'kallsyms__parse' multiple times, when
we could likely consolidate these calls since most of the overhead comes
from reading and parsing, not from processing the symbols.
Notably, the third call I mentioned (in machine__create_extra_kernel_maps)
accounts for about half of the total kallsyms parsing time, yet appears to
have no effect on my test system. This is why I'm questioning whether we
need to keep this functionality.
Ultimately, I believe we should explore ways to avoid reading /proc/kallsyms
altogether, given how expensive this operation is.
Best regards,
Krzysztof