Re: [PATCH v4 1/1] riscv: optimize ELF relocation function in riscv
From: Charlie Jenkins
Date: Mon Dec 11 2023 - 20:51:18 EST
On Thu, Dec 07, 2023 at 05:02:16PM -0800, Charlie Jenkins wrote:
> On Wed, Sep 13, 2023 at 04:05:00PM +0300, Maxim Kochetkov wrote:
> > The patch can optimize the running times of insmod command by modify ELF
> > relocation function.
> > In the 5.10 and latest kernel, when install the riscv ELF drivers which
> > contains multiple symbol table items to be relocated, kernel takes a lot
> > of time to execute the relocation. For example, we install a 3+MB driver
> > need 180+s.
> > We focus on the riscv architecture handle R_RISCV_HI20 and R_RISCV_LO20
> > type items relocation function in the arch\riscv\kernel\module.c and
> > find that there are two-loops in the function. If we modify the begin
> > number in the second for-loops iteration, we could save significant time
> > for installation. We install the same 3+MB driver could just need 2s.
> >
> > Signed-off-by: Amma Lee <lixiaoyun@xxxxxxxxxxxxxxx>
> > Signed-off-by: Maxim Kochetkov <fido_max@xxxxxxxx>
> > ---
> > Changes in v4:
> > - use 'while' loop instead of 'for' loop to avoid code duplicate
> > ---
> > arch/riscv/kernel/module.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 7c651d55fcbd..8c9b644ebfdb 100644
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -346,6 +346,7 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > Elf_Sym *sym;
> > u32 *location;
> > unsigned int i, type;
> > + unsigned int j_idx = 0;
> > Elf_Addr v;
> > int res;
> >
> > @@ -384,9 +385,10 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > v = sym->st_value + rel[i].r_addend;
> >
> > if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> > - unsigned int j;
> > + unsigned int j = j_idx;
> > + bool found = false;
> >
> > - for (j = 0; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > + do {
> > unsigned long hi20_loc =
> > sechdrs[sechdrs[relsec].sh_info].sh_addr
> > + rel[j].r_offset;
> > @@ -415,16 +417,26 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> > hi20 = (offset + 0x800) & 0xfffff000;
> > lo12 = offset - hi20;
> > v = lo12;
> > + found = true;
> >
> > break;
> > }
> > - }
> > - if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > +
> > + j++;
> > + if (j > sechdrs[relsec].sh_size / sizeof(*rel))
> > + j = 0;
> Very interesting algorithm here. Assuming the hi relocation is after the
> previous one seems to be a good heuristic. However I think we can do
> better. In GNU ld, a hashmap of all of the hi relocations is stored and
> a list of all of the lo relocations. After all of the other relocations
> have been parsed, it iterates through all of the lo relocations and
> looks up the associated hi relocation in the hashmap.
>
> There is more memory overhead here but I suspect it will be faster. I
> had started to mock up a hashmap implementation to see if it was faster
> but decided I should mention it here first in case somebody had some
> additional insight.
Turns out this is a fantastic heuristic. Using a hashmap is
significantly faster than the default implementation but this algorithm
above is significantly faster than the hashmap. Using the amdgpu driver
(which is actually a collection of drivers) and is a size of about 469M
I found that the hashmap implementation is about 30% faster than the
current implementation, but this patch is 50% faster than the current
implementation. It is probably possible to write an ELF header with the
relocations sufficiently scrambled to make the hashmap faster, but I
suspect that for all "normal" programs this algorithm is faster.
I also tried a couple other smaller modules and it was faster or around
the same as the hashmap in all of them.
A lot of code has changed in this file since this patch was submitted,
can you rebase onto 6.7-rc1? Otherwise this patch is great.
Reviewed-by: Charlie Jenkins <charlie@xxxxxxxxxxxx>
>
> - Charlie
>
> > +
> > + } while (j_idx != j);
> > +
> > + if (!found) {
> > pr_err(
> > "%s: Can not find HI20 relocation information\n",
> > me->name);
> > return -EINVAL;
> > }
> > +
> > + /* Record the previous j-loop end index */
> > + j_idx = j;
> > }
> >
> > res = handler(me, location, v);
> > --
> > 2.40.1
> >
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-riscv