Re: [PATCH] nolibc: optimise _start() on x86_64

From: Willy Tarreau
Date: Sun Dec 03 2023 - 16:37:57 EST


On Sun, Dec 03, 2023 at 03:00:48PM +0300, Alexey Dobriyan wrote:
> On Sat, Dec 02, 2023 at 02:23:59PM +0100, Willy Tarreau wrote:
> > Hi Alexey,
> >
> > On Sat, Dec 02, 2023 at 03:45:13PM +0300, Alexey Dobriyan wrote:
> > > Just jump into _start_c, it is not going to return anyway.
> >
> > Thanks, but what's upper in the stack there ?
>
> argc
>
> (gdb) break _start
> (gdb) run
>
> (gdb) x/20gx $sp
> 0x7fffffffdae0: 0x0000000000000004 0x00007fffffffdf33
> 0x7fffffffdaf0: 0x00007fffffffdf49 0x00007fffffffdf4b
> 0x7fffffffdb00: 0x00007fffffffdf4d 0x0000000000000000
> 0x7fffffffdb10: 0x00007fffffffdf4f 0x00007fffffffdf70
> 0x7fffffffdb20: 0x00007fffffffdf80 0x00007fffffffdfce
>
> (gdb) x/s 0x00007fffffffdf33
> 0x7fffffffdf33: "/home/ad/s-test/a.out"
>
> > I'm trying to make sure
> > that if _start_c returns we don't get a random behavior.
>
> Yes, it should segfault executing from very small address.
> I tested with
>
> .intel_syntax noprefix
> .globl _start
> _start:
> ret
> mov eax, 231
> xor edi, edi
> syscall

Well, this could possibly be acceptable then but the ABI also says that
we need rsp to be 16-byte aligned before the call, so it's supposed to
be 8 on top of this, so this would actually require more code to maintain
this guarantee, since a sub rsp,8 is longer than just the hlt we're saving.

> > If we get a
> > systematic crash (e.g. 0 always there) that's fine, what would be
> > annoying would be random infinite loops etc. In the psABI description
> > (table 3.9) I'm seeing "undefined" before argc, which I don't find
> > much appealing.
> >
> > > Signed-off-by: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> > > ---
> > >
> > > Also, kernel clears all registers before starting process,
> > > I'm not sure why
> > >
> > > xor ebp, ebp
> > >
> > > was added.
> >
> > Hmmm psABI says:
> >
> > Only the registers listed below have specied values at process entry:
> >
> > %rbp The content of this register is unspecied at process initialization
> > time, but the user code should mark the deepest stack frame by setting
> > the frame pointer to zero.
> >
> > %rsp The stack pointer holds the address of the byte with lowest address
> > which is part of the stack. It is guaranteed to be 16-byte aligned at
> > process entry.
> >
> > %rdx a function pointer that the application should register with atexit (BA_OS).
> >
> > Thus apparently it's documented as being our job to clear it :-/
>
> I meant, ELF loader clears all registers except rsp and aligns the stack to 16 bytes.
> There were problems with stack aligning, but registers, I think, were always zeroed.

But there's a strong difference between what's observed and what's
specified. If you get the x86_64 ABI spec to reflect this, it becomes
the standard and we can rely on it. Otherwise the standard remains what
is documented, and what is implemented may change while remaining within
the specs above.

Willy