Re: C aggregate passing (Rust kernel policy)

From: Willy Tarreau
Date: Sat Feb 22 2025 - 01:33:37 EST


On Fri, Feb 21, 2025 at 09:45:01PM +0000, David Laight wrote:
> On Fri, 21 Feb 2025 11:12:27 -0800
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Fri, 21 Feb 2025 at 10:34, David Laight <david.laight.linux@xxxxxxxxx> wrote:
> > >
> > > As Linus said, most modern ABI pass short structures in one or two registers
> > > (or stack slots).
> > > But aggregate returns are always done by passing a hidden pointer argument.
> > >
> > > It is annoying that double-sized integers (u64 on 32bit and u128 on 64bit)
> > > are returned in a register pair - but similar sized structures have to be
> > > returned by value.
> >
> > No, they really don't. At least not on x86 and arm64 with our ABI.
> > Two-register structures get returned in registers too.
> >
> > Try something like this:
> >
> > struct a {
> > unsigned long val1, val2;
> > } function(void)
> > { return (struct a) { 5, 100 }; }
> >
> > and you'll see both gcc and clang generate
> >
> > movl $5, %eax
> > movl $100, %edx
> > retq
> >
> > (and you'll similar code on other architectures).
>
> Humbug, I'm sure it didn't do that the last time I tried it.

You have not dreamed, most likely last time you tried it was on
a 32-bit arch like i386 or ARM. Gcc doesn't do that there, most
likely due to historic reasons that couldn't be changed later,
it passes a pointer argument to write the data there:

00000000 <fct>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: c7 00 05 00 00 00 movl $0x5,(%eax)
a: c7 40 04 64 00 00 00 movl $0x64,0x4(%eax)
11: c2 04 00 ret $0x4

You can improve it slightly with -mregparm but that's all,
and I never found an option nor attribute to change that:

00000000 <fct>:
0: c7 00 05 00 00 00 movl $0x5,(%eax)
6: c7 40 04 64 00 00 00 movl $0x64,0x4(%eax)
d: c3 ret

ARM does the same on 32 bits:

00000000 <fct>:
0: 2105 movs r1, #5
2: 2264 movs r2, #100 ; 0x64
4: e9c0 1200 strd r1, r2, [r0]
8: 4770 bx lr

I think it's simply that this practice arrived long after these old
architectures were fairly common and it was too late to change their
ABI. But x86_64 and aarch64 had the opportunity to benefit from this.
For example, gcc-3.4 on x86_64 already does the right thing:

0000000000000000 <fct>:
0: ba 64 00 00 00 mov $0x64,%edx
5: b8 05 00 00 00 mov $0x5,%eax
a: c3 retq

So does aarch64 since the oldest gcc I have that supports it (linaro 4.7):

0000000000000000 <fct>:
0: d28000a0 mov x0, #0x5 // #5
4: d2800c81 mov x1, #0x64 // #100
8: d65f03c0 ret

For my use cases I consider that older architectures are not favored but
they are not degraded either, while newer ones do significantly benefit
from the approach, that's why I'm using it extensively.

Quite frankly, there's no reason to avoid using this for pairs of pointers
or (status,value) pairs or coordinates etc. And if you absolutely need to
also support 32-bit archs optimally, you can do it using a macro to turn
your structs to a larger register and back:

struct a {
unsigned long v1, v2;
};

#define MKPAIR(x) (((unsigned long long)(x.v1) << 32) | (x.v2))
#define GETPAIR(x) ({ unsigned long long _x = x; (struct a){ .v1 = (_x >> 32), .v2 = (_x)}; })

unsigned long long fct(void)
{
struct a a = { 5, 100 };
return MKPAIR(a);
}

long caller(void)
{
struct a a = GETPAIR(fct());
return a.v1 + a.v2;
}

00000000 <fct>:
0: b8 64 00 00 00 mov $0x64,%eax
5: ba 05 00 00 00 mov $0x5,%edx
a: c3 ret

0000000b <caller>:
b: b8 69 00 00 00 mov $0x69,%eax
10: c3 ret

But quite frankly due to their relevance these days I don't think it's
worth the effort.

Hoping this helps,
Willy