Re: [PATCH v6 03/20] modpost: detect section mismatch for R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS

From: Masahiro Yamada
Date: Tue May 23 2023 - 07:59:45 EST


On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>
> On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > + linux-arm-kernel
> >
> > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote:
> > >
> > > ARM defconfig misses to detect some section mismatches.
> > >
> > > [test code]
> > >
> > > #include <linux/init.h>
> > >
> > > int __initdata foo;
> > > int get_foo(int x) { return foo; }
> > >
> > > It is apparently a bad reference, but modpost does not report anything
> > > for ARM defconfig (i.e. multi_v7_defconfig).
> > >
> > > The test code above produces the following relocations.
> > >
> > > Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> > > Offset Info Type Sym.Value Sym. Name
> > > 00000000 0000062b R_ARM_MOVW_ABS_NC 00000000 .LANCHOR0
> > > 00000004 0000062c R_ARM_MOVT_ABS 00000000 .LANCHOR0
> > >
> > > Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> > > Offset Info Type Sym.Value Sym. Name
> > > 00000000 0000022a R_ARM_PREL31 00000000 .text
> > > 00000000 00001000 R_ARM_NONE 00000000 __aeabi_unwind_cpp_pr0
> > >
> > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> > >
> > > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > > how the offset is encoded in the instruction.
> > >
> > > The referenced symbol in relocation might be a local anchor.
> > > If is_valid_name() returns false, let's search for a better symbol name.
> > >
> > > Signed-off-by: Masahiro Yamada <masahiroy@xxxxxxxxxx>
> > > ---
> > >
> > > scripts/mod/modpost.c | 12 ++++++++++--
> > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > > index 34fbbd85bfde..ed2301e951a9 100644
> > > --- a/scripts/mod/modpost.c
> > > +++ b/scripts/mod/modpost.c
> > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> > > /**
> > > * Find symbol based on relocation record info.
> > > * In some cases the symbol supplied is a valid symbol so
> > > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> > > * In other cases the symbol needs to be looked up in the symbol table
> > > * based on section and address.
> > > * **/
> > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> > > Elf64_Sword d;
> > > unsigned int relsym_secindex;
> > >
> > > - if (relsym->st_name != 0)
> > > + if (is_valid_name(elf, relsym))
> > > return relsym;
> > >
> > > /*
> > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> > > unsigned int r_typ = ELF_R_TYPE(r->r_info);
> > > Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> > > unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > > + int offset;
> > >
> > > switch (r_typ) {
> > > case R_ARM_ABS32:
> > > r->r_addend = inst + sym->st_value;
> > > break;
> > > + case R_ARM_MOVW_ABS_NC:
> > > + case R_ARM_MOVT_ABS:
> > > + offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > > + offset = (offset ^ 0x8000) - 0x8000;
> >
> > The code in arch/arm/kernel/module.c then right shifts the offset by
> > 16 for R_ARM_MOVT_ABS. Is that necessary?
> >
>
> MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
> value must be encoded in both instructions.


In my understanding, 'movt' loads the immediate value to
the upper 16-bit of the register.

I am just curious about the code in arch/arm/kernel/module.c.

Please see 'case R_ARM_MOVT_ABS:' part.

[1] 'offset' is the immediate value encoded in instruction
[2] Add sym->st_value
[3] Right-shift 'offset' by 16
[4] Write it back to the instruction

So, the immediate value encoded in the instruction
is divided by 65536.

I guess we need something like the following?
(left-shift by 16).

if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS ||
ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL)
offset <<= 16;




>
> When constructing the actual immediate value from the symbol value and
> the addend, only the top 16 bits are used in MOVT and the bottom 16
> bits in MOVW.
>
> However, this code seems to borrow the Elf_Rela::addend field (which
> ARM does not use natively) to record the intermediate value, which
> would need to be split if it is used to fix up instruction opcodes.

At first, modpost supported only RELA for section mismatch checks.

Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections")
added REL support.

But, the common code still used Elf_Rela.


modpost does not need to write back the fixed instruction.
modpost is only interested in the offset address.

Currently, modpost saves the offset address in
r->r_offset even for Rel. I do not like this code.

So, I am trying to reduce the use of Elf_Rela.
For example, this patch.
https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@xxxxxxxxxx/


> Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.

Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check.

Several relocation types are just skipped.






>
>
> > > + offset += sym->st_value;
> > > + r->r_addend = offset;
> > > + break;
> > > case R_ARM_PC24:
> > > case R_ARM_CALL:
> > > case R_ARM_JUMP24:
> > > --
> > > 2.39.2
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
--
Best Regards
Masahiro Yamada