Re: [PATCH RFC] binfmt_elf: fully allocate bss pages

From: Thomas Weißschuh
Date: Fri Sep 15 2023 - 18:51:39 EST


On 2023-09-15 23:15:05+0100, Pedro Falcato wrote:
> On Fri, Sep 15, 2023 at 4:54 AM Thomas Weißschuh <linux@xxxxxxxxxxxxxx> wrote:
> >
> > When allocating the pages for bss the start address needs to be rounded
> > down instead of up.
> > Otherwise the start of the bss segment may be unmapped.
> >
> > The was reported to happen on Aarch64:
> >
> > Memory allocated by set_brk():
> > Before: start=0x420000 end=0x420000
> > After: start=0x41f000 end=0x420000
> >
> > The triggering binary looks like this:
> >
> > Elf file type is EXEC (Executable file)
> > Entry point 0x400144
> > There are 4 program headers, starting at offset 64
> >
> > Program Headers:
> > Type Offset VirtAddr PhysAddr
> > FileSiz MemSiz Flags Align
> > LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
> > 0x0000000000000178 0x0000000000000178 R E 0x10000
> > LOAD 0x000000000000ffe8 0x000000000041ffe8 0x000000000041ffe8
> > 0x0000000000000000 0x0000000000000008 RW 0x10000
> > NOTE 0x0000000000000120 0x0000000000400120 0x0000000000400120
> > 0x0000000000000024 0x0000000000000024 R 0x4
> > GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
> > 0x0000000000000000 0x0000000000000000 RW 0x10
> >
> > Section to Segment mapping:
> > Segment Sections...
> > 00 .note.gnu.build-id .text .eh_frame
> > 01 .bss
> > 02 .note.gnu.build-id
> > 03
> >
> > Reported-by: Sebastian Ott <sebott@xxxxxxxxxx>
> > Closes: https://lore.kernel.org/lkml/5d49767a-fbdc-fbe7-5fb2-d99ece3168cb@xxxxxxxxxx/
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
> > ---
> >
> > I'm not really familiar with the ELF loading process, so putting this
> > out as RFC.
> >
> > A example binary compiled with aarch64-linux-gnu-gcc 13.2.0 is available
> > at https://test.t-8ch.de/binfmt-bss-repro.bin
> > ---
> > fs/binfmt_elf.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 7b3d2d491407..4008a57d388b 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -112,7 +112,7 @@ static struct linux_binfmt elf_format = {
> >
> > static int set_brk(unsigned long start, unsigned long end, int prot)
> > {
> > - start = ELF_PAGEALIGN(start);
> > + start = ELF_PAGESTART(start);
> > end = ELF_PAGEALIGN(end);
> > if (end > start) {
> > /*
>
> I don't see how this change can be correct. set_brk takes the start of
> .bss as the start, so doing ELF_PAGESTART(start) will give you what
> may very well be another ELF segment. In the common case, you'd map an
> anonymous page on top of someone's .data, which will misload the ELF.

That does make sense, and indeed it breaks more complex binaries.

> The current logic looks OK to me (gosh this code would ideally take a
> good refactoring...). I still can't quite tell how padzero() (in the
> original report) is -EFAULTing though.

As a test I replaced the asm clear_user() in padzero() with the generic
memset()-based implementation from include/asm-generic/uaccess.h.
It does provide better diagnostics, see below.

Who should have mapped this partial .bss page if there is no .data?
Maybe the logic needs to be a bit more complex and check if this page
has been already mapped for .data and in that case don't map it again.



[ 5.620235] Run /init as init process
[ 5.662763] CUSTOM DEBUG ELF_PAGEALIGN(start)=0x420000 ELF_PAGEALIGN(end)=0x420000 ELF_PAGESTART(0x41f000)
[ 5.667176] Unable to handle kernel paging request at virtual address 000000000041ffe8
[ 5.668062] Mem abort info:
[ 5.668429] ESR = 0x0000000096000045
[ 5.669400] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5.670119] SET = 0, FnV = 0
[ 5.670608] EA = 0, S1PTW = 0
[ 5.671172] FSC = 0x05: level 1 translation fault
[ 5.672024] Data abort info:
[ 5.673273] ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
[ 5.674169] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 5.674991] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5.676871] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000043f20000
[ 5.677776] [000000000041ffe8] pgd=0800000043c62003, p4d=0800000043c62003, pud=0800000043c62003, pmd=0000000000000000
[ 5.681522] Internal error: Oops: 0000000096000045 [#1] PREEMPT SMP
[ 5.682604] Modules linked in:
[ 5.683576] CPU: 0 PID: 1 Comm: init Not tainted 6.6.0-rc1+ #241 00a261b9689606c4fc0c90eb29739c5b0eec7b82
[ 5.684706] Hardware name: linux,dummy-virt (DT)
[ 5.685462] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5.686094] pc : __memset+0x50/0x188
[ 5.686572] lr : padzero+0x84/0xa0
[ 5.686956] sp : ffffffc08003bc70
[ 5.687307] x29: ffffffc08003bc70 x28: 0000000000000000 x27: ffffff80026afa00
[ 5.688091] x26: 000000000041fff0 x25: 000000000041ffe8 x24: 0000000000400144
[ 5.688698] x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
[ 5.689275] x20: 0000000000000fe8 x19: 000000000041ffe8 x18: ffffffffffffffff
[ 5.689928] x17: ffffffdf46cd2984 x16: ffffffdf46cd2880 x15: 0720072007200720
[ 5.690597] x14: 0720072007200720 x13: 0720072007200720 x12: 0000000000000000
[ 5.691192] x11: 00000000ffffefff x10: 0000000000000000 x9 : ffffffdf46e1ba28
[ 5.691906] x8 : 000000000041ffe8 x7 : 0000000000000000 x6 : 0000000000057fa8
[ 5.692496] x5 : 0000000000000fff x4 : 0000000000000008 x3 : 0000000000000000
[ 5.693168] x2 : 0000000000000018 x1 : 0000000000000000 x0 : 000000000041ffe8
[ 5.693985] Call trace:
[ 5.694318] __memset+0x50/0x188
[ 5.694708] load_elf_binary+0x630/0x15d0
[ 5.695132] bprm_execve+0x2bc/0x7c0
[ 5.695505] kernel_execve+0x144/0x1c8
[ 5.695882] run_init_process+0xf8/0x110
[ 5.696264] kernel_init+0x8c/0x200
[ 5.696624] ret_from_fork+0x10/0x20
[ 5.697216] Code: d65f03c0 cb0803e4 f2400c84 54000080 (a9001d07)
[ 5.698936] ---[ end trace 0000000000000000 ]---
[ 5.701625] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 5.702502] SMP: stopping secondary CPUs
[ 5.703608] Kernel Offset: 0x1ec6a00000 from 0xffffffc080000000
[ 5.704119] PHYS_OFFSET: 0x40000000
[ 5.704491] CPU features: 0x0000000d,00020000,0000420b
[ 5.705276] Memory Limit: none