Re: [PATCH v2 1/8] scripts/kallsyms: don't compress symbol type when CONFIG_KALLSYMS_ALL=y
From: Petr Mladek
Date: Tue Sep 20 2022 - 13:26:53 EST
On Fri 2022-09-09 21:00:09, Zhen Lei wrote:
> Currently, to search for a symbol, we need to expand the symbols in
> 'kallsyms_names' one by one, and then use the expanded string for
> comparison. This is very slow.
>
> In fact, we can first compress the name being looked up and then use
> it for comparison when traversing 'kallsyms_names'.
This does not explain how this patch modifies the compressed data
and why it is needed.
> This increases the size of 'kallsyms_names'. About 48KiB, 2.67%, on x86
> with defconfig.
> Before: kallsyms_num_syms=131392, sizeof(kallsyms_names)=1823659
> After : kallsyms_num_syms=131392, sizeof(kallsyms_names)=1872418
>
> However, if CONFIG_KALLSYMS_ALL is not set, the size of 'kallsyms_names'
> does not change.
>
> Signed-off-by: Zhen Lei <thunder.leizhen@xxxxxxxxxx>
> ---
> scripts/kallsyms.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
> index f18e6dfc68c5839..ab6fe7cd014efd1 100644
> --- a/scripts/kallsyms.c
> +++ b/scripts/kallsyms.c
> @@ -60,6 +60,7 @@ static unsigned int table_size, table_cnt;
> static int all_symbols;
> static int absolute_percpu;
> static int base_relative;
> +static int sym_start_idx;
>
> static int token_profit[0x10000];
>
> @@ -511,7 +512,7 @@ static void learn_symbol(const unsigned char *symbol, int len)
> {
> int i;
>
> - for (i = 0; i < len - 1; i++)
> + for (i = sym_start_idx; i < len - 1; i++)
> token_profit[ symbol[i] + (symbol[i + 1] << 8) ]++;
This skips the first character in the @symbol string. I do not see how
this is used in the new code, for example, in
kallsyms_on_each_match_symbol(), in the 5th patch. It seems to iterate
the compressed data from the 0th index:
for (i = 0, off = 0; i < kallsyms_num_syms; i++)
> }
>
> @@ -520,7 +521,7 @@ static void forget_symbol(const unsigned char *symbol, int len)
> {
> int i;
>
> - for (i = 0; i < len - 1; i++)
> + for (i = sym_start_idx; i < len - 1; i++)
> token_profit[ symbol[i] + (symbol[i + 1] << 8) ]--;
> }
>
> @@ -538,7 +539,7 @@ static unsigned char *find_token(unsigned char *str, int len,
> {
> int i;
>
> - for (i = 0; i < len - 1; i++) {
> + for (i = sym_start_idx; i < len - 1; i++) {
> if (str[i] == token[0] && str[i+1] == token[1])
> return &str[i];
> }
> @@ -780,6 +781,14 @@ int main(int argc, char **argv)
> } else if (argc != 1)
> usage();
>
> + /*
> + * Skip the symbol type, do not compress it to optimize the performance
> + * of finding or traversing symbols in kernel, this is good for modules
> + * such as livepatch.
I see. The type is added as the first character here.
in static struct sym_entry *read_symbol(FILE *in)
{
[...]
/* include the type field in the symbol name, so that it gets
* compressed together */
[...]
sym->sym[0] = type;
strcpy(sym_name(sym), name);
It sounds a bit crazy. read_symbol() makes a trick so that the type
can be compressed. This patch does another trick to avoid it.
> + */
> + if (all_symbols)
> + sym_start_idx = 1;
This looks a bit fragile. My understanding is that the new code in
kernel/kallsyms.c and kernel/module/kallsyms.c depends on this change.
The faster search is used when CONFIG_KALLSYMS_ALL is defined.
But the data are compressed this way when this script is called
with --all-symbols.
Is it guaranteed that this script will generate the needed data
when CONFIG_KALLSYMS_ALL is defined?
What about 3rd party modules?
I would personally suggest to store the symbol type into a separate
sym->type entry in struct sym_entry and never compress it.
IMHO, the size win is not worth the code complexity.
Well, people compiling the kernel for small devices might think
different. But they probably disable kallsyms completely.
Best Regards,
Petr