Fixing ELF loader for systems with oversized pages [was: Re: [musl] Segmentation fault musl 1.2.4]
From: Rich Felker
Date: Tue Jan 30 2024 - 15:32:09 EST
On Tue, Jan 30, 2024 at 10:37:30AM -0500, Rich Felker wrote:
> On Tue, Jan 30, 2024 at 11:43:38AM +0100, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@xxxxxxxx> [2024-01-16 15:45:52 -0500]:
> >
> > > On Tue, Jan 16, 2024 at 07:29:18PM +0100, Szabolcs Nagy wrote:
> > > > * Cody Wetzel <codyawetzel@xxxxxxxxx> [2024-01-16 09:21:05 -0600]:
> > > > > Here is the output for the old
> > > > > ....
> > > > > >
> > > > > > / # /tmp/ld-musl-armhf.so.1 /usr/bin/readelf -lW /tmp/ld-musl-armhf.so.1
> > > > > >
> > > > > > Elf file type is DYN (Shared object file)
> > > > > > Entry point 0x359cd
> > > > > > There are 6 program headers, starting at offset 52
> > > > > >
> > > > > > Program Headers:
> > > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > > > > > EXIDX 0x07acec 0x0007acec 0x0007acec 0x00008 0x00008 R 0x4
> > > > > > LOAD 0x000000 0x00000000 0x00000000 0x7acf4 0x7acf4 R E 0x10000
> > > > > > LOAD 0x07fd6c 0x0008fd6c 0x0008fd6c 0x0054a 0x02258 RW 0x10000
> > > >
> > > > this load segment is 64k aligned.
> > > >
> > > > > > DYNAMIC 0x07febc 0x0008febc 0x0008febc 0x000c0 0x000c0 RW 0x4
> > > > > > GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
> > > > > > GNU_RELRO 0x07fd6c 0x0008fd6c 0x0008fd6c 0x00294 0x00294 R 0x1
> > > > > >
> > > > > > Section to Segment mapping:
> > > > > > Segment Sections...
> > > > > > 00 .ARM.exidx
> > > > > > 01 .hash .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text
> > > > > > .rodata .ARM.exidx
> > > > > > 02 .data.rel.ro .dynamic .got .data .bss
> > > > > > 03 .dynamic
> > > > > > 04
> > > > > > 05 .data.rel.ro .dynamic .got
> > > > > >
> > > > >
> > > > > And the new...
> > > > >
> > > > > / # /tmp/ld-musl-armhf.so.1 /usr/bin/readelf -lW /lib/ld-musl-armhf.so.1
> > > > > >
> > > > > > Elf file type is DYN (Shared object file)
> > > > > > Entry point 0x362f1
> > > > > > There are 6 program headers, starting at offset 52
> > > > > >
> > > > > > Program Headers:
> > > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > > > > > EXIDX 0x07b81c 0x0007b81c 0x0007b81c 0x00008 0x00008 R 0x4
> > > > > > LOAD 0x000000 0x00000000 0x00000000 0x7b824 0x7b824 R E 0x1000
> > > > > > LOAD 0x07bd74 0x0007cd74 0x0007cd74 0x0054a 0x0225c RW 0x1000
> > > >
> > > > this load segment is 4k aligned and offset vs addr is not congruent
> > > > modulo 64k, or 32k, so won't work on systems with such page size.
> > > >
> > > > > > DYNAMIC 0x07bebc 0x0007cebc 0x0007cebc 0x000c0 0x000c0 RW 0x4
> > > > > > GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
> > > > > > GNU_RELRO 0x07bd74 0x0007cd74 0x0007cd74 0x0028c 0x0028c R 0x1
> > > > > >
> > > > > > Section to Segment mapping:
> > > > > > Segment Sections...
> > > > > > 00 .ARM.exidx
> > > > > > 01 .hash .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text
> > > > > > .rodata .ARM.exidx
> > > > > > 02 .data.rel.ro .dynamic .got .data .bss
> > > > > > 03 .dynamic
> > > > > > 04
> > > > > > 05 .data.rel.ro .dynamic .got
> > > > >
> > > > >
> > > > > I hope that helps.
> > > >
> > > > yes, this is a linking issue, not musl libc.
> > > >
> > > > alpine linux links binaries for 4k pagesize only.
> > > >
> > > > arm linkers were updated at some point to create binaries supporting
> > > > up to 64k pagesize. i suspect some ppl ran into issues in practice
> > > > and decided the larger binaries are not worth it, if they dont work
> > > > reliably and forced 4k page size at link time.
> > > >
> > > > you have to raise an issue with alpine linux, if you think 32k
> > > > oage size is useful and reliably supportable.
> > >
> > > Are they using -Wl,-z,separate-code? That incurs a large
> > > binary-size-on-disk penalty when supporting oversized pages, and IIRC
> > > something was done to make the linker default to not supporting
> > > oversized pages when that's used. It might be the reason, if arm
> > > linking is normally expected to use a larger max pagesize.
> >
> > i looked at this now, turns out they just changed the
> > pagesize back to 4k (i missed this change):
> >
> > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1a26a53a0dee39106ba58fcb15496c5f13074652
>
> This doesn't help immediately, but a major ingredient to fix this
> situation would be getting the kernel to stop doing the wrong thing.
> Right now, it's ignoring the fact that the ELF program header
> constraints are incompatible with mmap given the oversized system
> pagesize, and just incorrectly mapping the executable and trying to
> run it anyway, whereby it blows up.
>
> The right thing to do would be either to fail with ENOEXEC in this
> case, or when mmap with the required offset constraint fails, falling
> back to making an anonymous map and copying the whole content of the
> loadable segment into that (no COW sharing). The latter is really not
> all that bad for got/data/etc. mappings which you expect will be dirty
> (modified) anyway.
>
> BTW the former choice (ENOEXEC) would allow doing the latter in
> userspace with a binfmt_misc loader.
Completely untested draft patch showing the concept is attached.
Rich
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index f8c7f26f1fbb..45c50f379377 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -861,6 +861,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
if (!elf_phdata)
goto out;
+ elf_ppnt = elf_phdata;
+ for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if (elf_ppnt->p_type != PT_LOAD) continue;
+ if (ELF_PAGEOFFSET(elf_ppnt->p_vaddr - elf_ppnt->p_offset))
+ goto out;
+ }
elf_ppnt = elf_phdata;
for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) {
char *elf_interpreter;
@@ -962,6 +968,13 @@ static int load_elf_binary(struct linux_binprm *bprm)
if (!interp_elf_phdata)
goto out_free_dentry;
+ elf_ppnt = interp_elf_phdata;
+ for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) {
+ if (elf_ppnt->p_type != PT_LOAD) continue;
+ if (ELF_PAGEOFFSET(elf_ppnt->p_vaddr - elf_ppnt->p_offset))
+ goto out_free_dentry;
+ }
+
/* Pass PT_LOPROC..PT_HIPROC headers to arch code */
elf_property_phdata = NULL;
elf_ppnt = interp_elf_phdata;