Re: C aggregate passing (Rust kernel policy)

From: Linus Torvalds
Date: Sat Feb 22 2025 - 16:23:08 EST


On Sat, 22 Feb 2025 at 12:54, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> VLIW and OoO might seem orthogonal, but they aren't – because they are
> trying to solve the same problem, combining them either means the OoO
> engine can't do a very good job because of false dependencies (if you
> are scheduling molecules) or you have to break them instructions down
> into atoms, at which point it is just a (often quite inefficient) RISC
> encoding.

Exactly. Either you end up tracking things at bundle boundaries - and
screwing up your OoO - or you end up tracking things as individual
ops, and then all the VLIW advantages go away (but the disadvantages
remain).

The only reason to combine OoO and VLIW is because you started out
with a bad VLIW design (*cough*itanium*cough*) and it turned into a
huge commercial success (oh, not itanium after all, lol), and now you
need to improve performance while keeping backwards compatibility.

So at that point you make it OoO to make it viable, and the VLIW side
remains as a bad historical encoding / semantic footnote.

> In short, VLIW *might* make sense when you are statically
> scheduling a known pipeline, but it is basically a dead end for
> evolution – so unless you can JIT your code for each new chip
> generation...

.. which is how GPUs do it, of course. So in specialized environments,
VLIW works fine.

Linus