Re: [PATCH 3/3] tools/nolibc: x86_64: wrap asm functions in functions
From: Willy Tarreau
Date: Sat Aug 10 2024 - 10:36:19 EST
Hi Thomas,
On Sat, Aug 10, 2024 at 02:37:19PM +0200, Thomas Weißschuh wrote:
> On 2024-08-10 19:12:25+0000, Ammar Faizi wrote:
> > On Sat, Aug 10, 2024 at 12:54:46PM +0200, Thomas Weißschuh wrote:
> > > +__attribute__((weak,unused,section(".text.nolibc_memmove")))
> > > +__nolibc_naked __no_stack_protector
> > > +void *memmove(void *dst __attribute__((unused)),
> > > + const void *src __attribute__((unused)),
> > > + size_t len __attribute__((unused)))
> > > +{
> > > + __asm__ volatile (
> > > + "movq %rdx, %rcx\n\t"
> > > + "movq %rdi, %rax\n\t"
> > > + "movq %rdi, %rdx\n\t"
> > > + "subq %rsi, %rdx\n\t"
> > > + "cmpq %rcx, %rdx\n\t"
> > > + "jb .Lbackward_copy\n\t"
> > > + "rep movsb\n\t"
> > > + "retq\n"
> > > + ".Lbackward_copy:"
> > > + "leaq -1(%rdi, %rcx, 1), %rdi\n\t"
> > > + "leaq -1(%rsi, %rcx, 1), %rsi\n\t"
> > > + "std\n\t"
> > > + "rep movsb\n\t"
> > > + "cld\n\t"
> > > + "retq\n"
> > > + );
> > > + __nolibc_naked_epilogue();
> > > +}
> >
> > NAK for this patch.
>
> Thanks for the feedback!
>
> (I'm not an assembler programmer, so regard my notes with a grain of salt)
>
> > This approach appears highly dangerous, particularly when the compiler
> > inlines the call. When using inline assembly within a function, it's
> > crucial to define proper constraints and a clobber list to ensure the
> > arguments are correctly bound to the inline assembly.
>
> Aren't the constraints a feature of Extended Asm?
> This is a Basic Asm block.
> Indeed naked functions only support Basic Asm, so there is no way to
> explicitly bind arguments to their registers.
That's indeed what is said in GNU docs, but a quick test with gcc 9 and 11
shows me that both accept both naked and extended asm. However clang doesn't
seem happy about it.
The problem here is dual:
- first, the naked attribute can be emulated if not supported by the
compiler and we'll only have optimize(-Os,-fomit-frame-pointer), and
in this case the compiler does not respect well the registers (I'm
seeing bad code being emitted in callers if I mark the function
static for example).
- second, nothing prevents the compiler from inlining that function
and in this case, as Ammar says, we have no control over what
registers arguments will be placed into since the main purpose of
inlining precisely is to build optimal code that limits register
moves. However, in my tests, it appears that when marked naked,
the compiler never inlines it and even emits a warning if I try
to mark it inline, saying it was already noinline. So maybe naked
implies noinline.
In any case, depending on the availability of naked, we have two clearly
different layouts to deal with :-/
We could imagine not marking it naked and keeping the optimize(Os...)
stuff only, then using extended asm like in other functions maybe.
Otherwise this could require two versions, which is less fun.
> Looking at the object code for various on both gcc and clang show always
> the same object code.
> (Although GCC adds a "ud2" after the "ret")
It *seems* so as well for me but in the not-really-naked case, you
really have zero control over it. Also one thing it does to me in the
not-naked-case is that registers are not saved by the caller and are
clobbered by the asm statement:
void *memmove2(void *dst __attribute__((unused)),
void *src __attribute__((unused)),
size_t len __attribute__((unused)))
{
memmove(dst, src, len);
memmove(src, dst, len);
return 0;
}
Gives me this with the naked attribute:
0000000000000025 <memmove2>:
25: 55 push %rbp
26: 48 89 f5 mov %rsi,%rbp
29: 50 push %rax
2a: 48 89 14 24 mov %rdx,(%rsp)
2e: e8 00 00 00 00 callq 33 <memmove2+0xe>
33: 48 8b 14 24 mov (%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 48 89 c6 mov %rax,%rsi
3d: e8 00 00 00 00 callq 42 <memmove2+0x1d>
42: 31 c0 xor %eax,%eax
44: 5a pop %rdx
45: 5d pop %rbp
46: c3 retq
But the alternate form (optimize(-Os...)):
0000000000000024 <memmove2>:
24: 49 89 f0 mov %rsi,%r8
27: e8 00 00 00 00 callq 2c <memmove2+0x8>
2c: 48 89 fe mov %rdi,%rsi
2f: 4c 89 c7 mov %r8,%rdi
32: e8 00 00 00 00 callq 37 <memmove2+0x13>
37: 31 c0 xor %eax,%eax
39: c3 retq
See how rdi and rdx were expected to be preserved after the first call
but were not? This was with gcc-9.5 (which supports naked but it's for
illustration purposes of the risk of leaving unconstrained asm like this).
I also managed to get clang to complain about the .Lbackward_copy label
being already defined, but I don't know how I managed to do it. I think
we should not leave it as a global label like this and instead just use
the regular numbered labels if the asm is inlined.
Also I'm wondering why there are errors about memcpy and memmove being
already defined with -flto, because these ones are marked "weak". Maybe
we need to add something else, that gcc emits with the functions when
using lto ?
Just my two cents,
Willy