Re: C aggregate passing (Rust kernel policy)

From: Kent Overstreet
Date: Sat Feb 22 2025 - 15:00:34 EST


On Sat, Feb 22, 2025 at 11:18:33AM -0800, Linus Torvalds wrote:
> On Sat, 22 Feb 2025 at 10:54, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
> >
> > If that work is successful it could lead to significant improvements in
> > code generation, since aliasing causes a lot of unnecessary spills and
> > reloads - VLIW could finally become practical.
>
> No.
>
> Compiler people think aliasing matters. It very seldom does. And VLIW
> will never become practical for entirely unrelated reasons (read: OoO
> is fundamentally superior to VLIW in general purpose computing).

OoO and VLIW are orthogonal, not exclusive, and we always want to go
wider, if we can. Separately, neverending gift that is Spectre should be
making everyone reconsider how reliant we've become on OoO.

We'll never get rid of OoO, I agree on that point. But I think it's
worth some thought experiments about how many branches actually need to
be there vs. how many are there because everyone's assumed "branches are
cheap! (so it's totally fine if the CPU sucks at the alternatives)" on
both the hardware and software side.

e.g. cmov historically sucked (and may still, I don't know), but a _lot_
of branches should just be dumb ALU ops. I wince at a lot of the
assembly I see gcc generate for e.g. short multiword integer
comparisons, there are a ton of places where it'll emit 3 or 5 branches
where 1 is all you need if we had better ALU primitives.

> Aliasing is one of those bug-bears where compiler people can make
> trivial code optimizations that look really impressive. So compiler
> people *love* having simplistic aliasing rules that don't require real
> analysis, because the real analysis is hard (not just expensive, but
> basically unsolvable).

I don't think crazy compiler experiments from crazy C people have much
relevance, here. I'm talking about if/when Rust is able to get this
right.

> The C standards body has been much too eager to embrace "undefined behavior".

Agree on C, but for the rest I think you're just failing to imagine what
we could have if everything wasn't tied to a language with
broken/missing semantics w.r.t. aliasing.

Yes, C will never get a memory model that gets rid of the spills and
reloads. But Rust just might. It's got the right model at the reference
level, we just need to see if they can push that down to raw pointers in
unsafe code.

But consider what the world would look like if Rust fixes aliasing and
we get a microarchitecture that's able to take advantage of it. Do a
microarchitecture that focuses some on ALU ops to get rid of as many
branches as possible (e.g. min/max, all your range checks that don't
trap), get rid of loads and spills from aliasing so you're primarily
running out of registers - and now you _do_ have enough instructions in
a basic block, with fixed latency, that you can schedule at compile time
to make VLIW worth it.

I don't think it's that big of a leap. Lack of cooperation between
hardware and compiler folks (and the fact that what the hardware people
wanted was impossible at the time) was what killed Itanium, so if you
fix those two things...

> The kernel basically turns all that off, as much as possible. Overflow
> isn't undefined in the kernel. Aliasing isn't undefined in the kernel.
> Things like that.

Yeah, the religion of undefined behaviour in C has been an absolute
nightmare.

It's not just the compiler folks though, that way of thinking has
infected entirely too many people people in kernel and userspace -
"performance is the holy grail and all that matters and thou shalt shave
every single damn instruction".

Where this really comes up for me is assertions, because we're not
giving great guidance there. It's always better to hit an assertion than
walk off into undefined behaviour la la land, but people see "thou shalt
not crash the kernel" as a reason not to use BUG_ON() when it _should_
just mean "always handle the error if you can't prove that it can't
happen".

> When 'integer overflow' means that you can _sometimes_ remove one
> single ALU operation in *some* loops, but the cost of it is that you
> potentially introduced some seriously subtle security bugs, I think we
> know it was the wrong thing to do.

And those branches just _do not matter_ in practice, since if one side
leads to a trap they're perfectly predicted and to a first approximation
we're always bottlenecked on memory.